Search code examples
python-3.xdataframepreprocessorsklearn-pandaslabel-encoding

How to decode LabelEncoder implemented column in pandas dataframe?


I'm having a dataset. Where I was practicing feature engineering by converting categorical objects to numbers, with the following lines of code:

import pandas as pd 
import numpy as np
from sklearn import preprocessing
df = pd.read_csv(r'train.csv',index_col='Id')
print(df.shape)
df.head()
colsNum = df.select_dtypes(np.number).columns
colsObj = df.columns.difference(colsNum)

df[colsNum] = df[colsNum].fillna(df[colsNum].mean()//1)
df[colsObj] = df[colsObj].fillna(df[colsObj].mode().iloc[0])

label_encoder = preprocessing.LabelEncoder() 
for col in colsObj:
    df[col] = label_encoder.fit_transform(df[col])
df.head()
for col in colsObj:
    df[col] = label_encoder.inverse_transform(df[col])
df.head()

But here the inverse_tranform() wasn't returning the original dataset. Please help me!


Solution

  • You need one encoder per column - you cannot encode all columns with the same encoder:

    import pandas as pd
    import numpy as np
    from sklearn import preprocessing
    df = pd.read_csv(r'train.csv', index_col='Id')
    print(df.shape)
    
    colsNum = df.select_dtypes(np.number).columns
    colsObj = df.columns.difference(colsNum)
    
    df[colsNum] = df[colsNum].fillna(df[colsNum].mean()//1)
    df[colsObj] = df[colsObj].fillna(df[colsObj].mode().iloc[0])
    print(df.head())
    
    encoder = {}
    
    for col in colsObj:
        encoder[col] = preprocessing.LabelEncoder()
        df[col] = encoder[col].fit_transform(df[col])
    print(df.head())
    
    for col in colsObj:
        df[col] = encoder[col].inverse_transform(df[col])
    print(df.head())
    

    You can also check out this answer for further details.