Search code examples
pythonpandasscikit-learnlabel-encoding

Iterating in Dataframe's Columns using column names as a List and then looping through the list in Python


Im trying to LabelEncode particular columns of a Dataframe. I have stored those column names in a list(cat_features). Now i want to use a For loop to iterate through this list's elements (which are strings) and use those elements to access dataframe's column. but it says

TypeError: argument must be a string or number

Since Im accessing the element of the list which is already a string. so i dont understand why it throw that error. Please help me understand why it doesn't work and what can I do to make it work.

cat_features = [x for x in features if x not in features_to_scale]

from sklearn.preprocessing import LabelEncoder

for feature in cat_features:
    le = LabelEncoder()
    dataframe[feature] = le.fit_transform(dataframe[feature])    

Solution

  • The error means that one or more of your columns contains a list/tuple/set or something similar. For this, you will need to convert the list/tuple to a string before you can apply a label encoder

    Also, instead of a loop, you can first filter your data frame by the features you need then use apply function -

    df = main_df[cat_features]
    df = df.astype(str)     #This step changes each column to string as label encoder cant work on lists/tuples/sets
    
    lb = LabelEncoder()
    df.apply(lb.fit_transform)
    

    Later you can combine this data frame with the remaining continuous features.