QBoard » Artificial Intelligence & ML » AI and ML - Python » Insert or delete a step in scikit-learn Pipeline

Insert or delete a step in scikit-learn Pipeline

  • Is it possible to delete or insert a step in a sklearn.pipeline.Pipeline object?

    I am trying to do a grid search with or without one step in the Pipeline object. And wondering whether I can insert or delete a step in the pipeline. I saw in the Pipeline source code, there is a self.steps object holding all the steps. We can get the steps by named_steps(). Before modifying it, I want to make sure, I do not cause unexpected effects.

    Here is a example code:
    from sklearn.pipeline import Pipeline
    from sklearn.svm import SVC
    from sklearn.decomposition import PCA
    estimators = [('reduce_dim', PCA()), ('svm', SVC())]
    clf = Pipeline(estimators)
    clf ​

    Is it possible that we do something like steps = clf.named_steps(), then insert or delete in this list? Does this cause undesired effect on the clf object?

      September 4, 2021 12:57 AM IST
    0
  • I see that everyone mentioned only the delete step. In case you want to also insert a step in the pipeline:
    pipe.steps.append(['step name',transformer()])
    pipe.steps works in the same way as lists do, so you can also insert an item into a specific location:
    pipe.steps.insert(1,['estimator',transformer()]) #insert as second step
      September 9, 2021 4:25 PM IST
    0
    1. from sklearn.pipeline import Pipeline.
    2. from sklearn.svm import SVC.
    3. from sklearn.decomposition import PCA.
    4. estimators = [('reduce_dim', PCA()), ('svm', SVC())]
    5. clf = Pipeline(estimators)
    6. clf.
      September 14, 2021 4:32 PM IST
    0
  • Based on rudimentary testing you can safely remove a step from a scikit-learn pipeline just like you would any list item, with a simple

    clf_pipeline.steps.pop(n)
    

     

    where n is the position of the individual estimator you are trying to remove.

     
      September 17, 2021 1:03 PM IST
    0
  • Just chiming in because I feel like the other answers answered the question of adding steps to a pipeline really well, but didn't really cover how to delete a step from a pipeline.

    Watch out with my approach though. Slicing lists in this instance is a bit weird.

    from sklearn.pipeline import Pipeline
    from sklearn.svm import SVC
    from sklearn.decomposition import PCA
    from sklearn.preprocessing import PolynomialFeatures
    
    estimators = [('reduce_dim', PCA()), ('poly', PolynomialFeatures()), ('svm', SVC())]
    clf = Pipeline(estimators)

     

    If you want to create a pipeline with just steps PCA/Polynomial you can just slice the list step by indexes and pass it to Pipeline

    clf1 = Pipeline(clf.steps[0:2])
    

     

    Want to just use steps 2/3? Watch out these slices don't always make the most amount of sense

    clf2 = Pipeline(clf.steps[1:3])
    

     

    Want to just use steps 1/3? I can't seem to do using this approach

    clf3 = Pipeline(clf.steps[0] + clf.steps[2]) # errors
    



      September 18, 2021 12:45 PM IST
    0