Search code examples
pythonpandaspreprocessor

How to find out what the numeric label corresponds to when using cat.codes? (after converting cat feats to numeric value)


I'm working on a ML project and am doing some preliminary feature selection (When I later train my actual machine learning model I intend to use OneHotEncoding).

To do the features selection I need to convert my categorical variables into numeric codes, like female:0, male:1, other:2. I can't do it manually because I have too many features and values. I'm trying to use cat.codes but I can't get it to tell me what the value corresponds to. E.g. does 0 correspond to male, female, or other?

I've tried 2 methods but neither seem to work

#Example data
import pandas as pd
data = [[14, "Male", "employed"], [89, "Female", "student"], [48, "Other", "employed"]]
df = pd.DataFrame(data, columns=['Age', 'Gender', 'Occupation'])

#Convert categorical feats to numeric values
categorical_feat = ["Gender", "Occupation"]
for col in categorical_feat:
    df[col] = df[col].astype("category").cat.codes

#Trying to find out what the numeric values correspond to:
df["Gender"].cat.categories[0]   #AttributeError: Can only use .cat accessor with a 'category' dtype
df["Gender"].astype("category").cat.categories[0]    #output is 0 ....which isnt what I want. I'm expecting "male" or "female" or "other"


Solution

  • Here is one way which you can probably adapt to suit:

    cat_list = []
    
    categorical_feat = ["Gender", "Occupation"]
    for col in categorical_feat:
        df[col] = df[col].astype("category")
        cat_list.append(dict( enumerate(df[col].cat.categories )))
        df[col] = df[col].cat.codes
    
    for idx, name in enumerate(categorical_feat):
        print(name)
        print(cat_list[idx])
    
    print(df)
    

    gives:

    Gender
    {0: 'Female', 1: 'Male', 2: 'Other'}
    Occupation
    {0: 'employed', 1: 'student'}
    
       Age  Gender  Occupation
    0   14       1           0
    1   89       0           1
    2   48       2           0