this can input dataframe with categorical data and return a dataframe with binary values. variable values are encoded into column names in the resulting dataframe. more
You can easily do this though,
df.apply(LabelEncoder().fit_transform)
EDIT2:
In scikit-learn 0.20, the recommended way is
OneHotEncoder().fit_transform(df)
as the OneHotEncoder now supports string input. Applying OneHotEncoder only to certain columns is possible with the ColumnTransformer.
EDIT:
Since this original answer is over a year ago, and generated many upvotes (including a bounty), I should probably extend this further.
For inverse_transform and transform, you have to do a little bit of hack.
from collections import defaultdict
d = defaultdict(LabelEncoder)
With this, you now retain all columns Label Encoder as dictionary.
# Encoding the variable
fit = df.apply(lambda x: d[x.name].fit_transform(x))
# Inverse the encoded
fit.apply(lambda x: d[x.name].inverse_transform(x))
# Using the dictionary to label future data
df.apply(lambda x: d[x.name].transform(x))
This post was edited by Viaan Prakash at August 14, 2021 12:45 PM IST
pandas.get_dummies(input_df)
this can input dataframe with categorical data and return a dataframe with binary values. variable values are encoded into column names in the resulting dataframe. more