QBoard » Artificial Intelligence & ML » AI and ML - Python » fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

  • I am totally new to Machine Learning and I have been working with unsupervised learning technique.

    Image shows my sample Data(After all Cleaning) Screenshot : Sample Data

    I have this two Pipline built to Clean the Data:

    num_attribs = list(housing_num)
    cat_attribs = ["ocean_proximity"]
    
    print(type(num_attribs))
    
    num_pipeline = Pipeline([
        ('selector', DataFrameSelector(num_attribs)),
        ('imputer', Imputer(strategy="median")),
        ('attribs_adder', CombinedAttributesAdder()),
        ('std_scaler', StandardScaler()),
    ])
    
    cat_pipeline = Pipeline([
        ('selector', DataFrameSelector(cat_attribs)),
        ('label_binarizer', LabelBinarizer())
    ])

    Then I did the union of this two pipelines and the code for the same is shown below :

    from sklearn.pipeline import FeatureUnion
    
    full_pipeline = FeatureUnion(transformer_list=[
            ("num_pipeline", num_pipeline),
            ("cat_pipeline", cat_pipeline),
        ])

    Now I am trying to do fit_transform on the Data But Its showing Me the Error.

    Code for Transformation:

    housing_prepared = full_pipeline.fit_transform(housing)
    housing_prepared

    Error message:

    fit_transform() takes 2 positional arguments but 3 were given

     




    This post was edited by Sai Anirudh at December 3, 2020 2:17 PM IST
      December 3, 2020 1:05 PM IST
    0
  • Since LabelBinarizer doesn't allow more than 2 positional arguments you should create your custom binarizer like
    class CustomLabelBinarizer(BaseEstimator, TransformerMixin):
        def __init__(self, sparse_output=False):
            self.sparse_output = sparse_output
        def fit(self, X, y=None):
            return self
        def transform(self, X, y=None):
            enc = LabelBinarizer(sparse_output=self.sparse_output)
            return enc.fit_transform(X)
    
    num_attribs = list(housing_num)
    cat_attribs = ['ocean_proximity']
    
    num_pipeline = Pipeline([
        ('selector', DataFrameSelector(num_attribs)),
        ('imputer', Imputer(strategy='median')),
        ('attribs_adder', CombinedAttributesAdder()),
        ('std_scalar', StandardScaler())
    ])
    
    cat_pipeline = Pipeline([
        ('selector', DataFrameSelector(cat_attribs)),
        ('label_binarizer', CustomLabelBinarizer())
    ])
    
    full_pipeline = FeatureUnion(transformer_list=[
        ('num_pipeline', num_pipeline),
        ('cat_pipeline', cat_pipeline)
    ])
    
    housing_prepared = full_pipeline.fit_transform(new_housing)
      July 31, 2021 11:42 PM IST
    0
  • The Problem:

    The pipeline is assuming LabelBinarizer's fit_transform method is defined to take three positional arguments:
    def fit_transform(self, x, y)
        ...rest of the code

    while it is defined to take only two:

    def fit_transform(self, x):
        ...rest of the code

    The Problem:

    The pipeline is assuming LabelBinarizer's fit_transform method is defined to take three positional arguments:

    def fit_transform(self, x, y)
        ...rest of the code
    

    while it is defined to take only two:

    def fit_transform(self, x):
        ...rest of the code
    

    Possible Solution:

    This can be solved by making a custom transformer that can handle 3 positional arguments:

    1. Import and make a new class:

      from sklearn.base import TransformerMixin #gives fit_transform method for free
      class MyLabelBinarizer(TransformerMixin):
          def __init__(self, *args, **kwargs):
              self.encoder = LabelBinarizer(*args, **kwargs)
          def fit(self, x, y=0):
              self.encoder.fit(x)
              return self
          def transform(self, x, y=0):
              return self.encoder.transform(x)
    2. Keep your code the same only instead of using LabelBinarizer(), use the class we created : MyLabelBinarizer().

    Note: If you want access to LabelBinarizer Attributes (e.g. classes_), add the following line to the fit method:
    self.classes_, self.y_type_, self.sparse_input_ = self.encoder.classes_, self.encoder.y_type_, self.encoder.sparse_input_
      December 24, 2020 11:21 AM IST
    0
  • I got the same issue, and got resolved by using DataFrameMapper (need to install sklearn_pandas):

    from sklearn_pandas import DataFrameMapper
    cat_pipeline = Pipeline([
        ('label_binarizer', DataFrameMapper([(cat_attribs, LabelBinarizer())])),
    ])
      November 24, 2021 11:58 AM IST
    0