QBoard » Artificial Intelligence & ML » AI and ML - Python » How do I apply label encoding on multiple columns?

How do I apply label encoding on multiple columns?

  • How do I apply label encoding on multiple columns?
      August 13, 2021 3:49 PM IST
    0
  • If you have numerical and categorical both type of data in dataframe You can use : here X is my dataframe having categorical and numerical both variables

    from sklearn import preprocessing
    le = preprocessing.LabelEncoder()
    
    for i in range(0,X.shape[1]):
        if X.dtypes=='object':
            X[X.columns] = le.fit_transform(X[X.columns])​
      August 21, 2021 11:58 AM IST
    0
  • df.apply(LabelEncoder().fit_transform)
      August 24, 2021 11:47 PM IST
    0
  • I don't think the large dataset is affecting your outcome. The purpose of LabelEncoder is to transform the prediction targets (In your case, I'm assuming, the Target column). From the User Guide:

    LabelEncoder is a utility class to help normalize labels such that they contain only values between 0 and n_classes-1.

    Here's an example, notice I changed the values of Target in your example CountryDF, just for demonstration purposes:

    from sklearn.preprocessing import LabelEncoder
    import numpy as np
    import pandas as pd
    
    CountryDF = pd.DataFrame([['CN_Milk powder_Incl_Others',np.nan,'Shanghai Hyper total','O.Brand',np.nan,np.nan,'Hi Cal Adult Milk Powders- C1'],
                                  ['CN_Milk powder_Incl_Others','Elder','Shanghai Hyper total','O.Brand',np.nan,np.nan,'Hi Cal Adult Milk Powders- C1'],
                                  ['CN_Milk powder_Incl_Others','Others','Shanghai Hyper total','O.Brand',np.nan,np.nan,'Hi Cal Adult Milk Powders- C1'],
                                  ['CN_Milk powder_Incl_Others','Lady','Shanghai Hyper total','O.Brand',np.nan,np.nan,'Hi Cal Adult Milk Powders- C1'],
                                 ['CN_Milk powder_Incl_Others',np.nan,'Shanghai Hyper total','O.Brand','S_B1',np.nan,'Hi Cal Adult Milk Powders- C1'],
                                 ['CN_Milk powder_Incl_Others',np.nan,'Shanghai Hyper total','O.Brand','S_B2',np.nan,'Hi Cal Adult Milk Powders- C1']],
                                columns=['Database','Target','Market_Description','Brand','Sub_Brand', 'Category','Class_Category'])​


    First, initialize the LabelEncoder, then fit and transform the data (while assigning the transformed data to a new column).

    le = LabelEncoder() # initialze the LabelEncoder once
    
    #Create a new column with transformed values.
    CountryDF['EncodedTarget'] = le.fit_transform(CountryDF['Target'])


    Notice, the last column, EncodedTarget is a transformed copy of Target.


    CountryDF
    
    Database    Target  Market_Description  Brand   Sub_Brand   Category    Class_Category  EncodedTarget
    0   CN_Milk powder_Incl_Others  NaN     Shanghai Hyper total    O.Brand     NaN     NaN     Hi Cal Adult Milk Powders- C1   0
    1   CN_Milk powder_Incl_Others  Elder   Shanghai Hyper total    O.Brand     NaN     NaN     Hi Cal Adult Milk Powders- C1   1
    2   CN_Milk powder_Incl_Others  Others  Shanghai Hyper total    O.Brand     NaN     NaN     Hi Cal Adult Milk Powders- C1   3
    3   CN_Milk powder_Incl_Others  Lady    Shanghai Hyper total    O.Brand     NaN     NaN     Hi Cal Adult Milk Powders- C1   2


    I hope this helps clear up LabelEncoder.

      August 14, 2021 12:49 PM IST
    0
  • If you have numerical and categorical both type of data in dataframe You can use : here X is my dataframe having categorical and numerical both variables

    from sklearn import preprocessing
    le = preprocessing.LabelEncoder()
    
    for i in range(0,X.shape[1]):
        if X.dtypes=='object':
            X[X.columns] = le.fit_transform(X[X.columns])

    Note: This technique is good if you are not interested in converting them back.

     
      August 14, 2021 10:07 PM IST
    0