Search code examples
pandaslabel-encoding

Use same category labeling criteria on two different dataframes


I have a dataFrame that contains a categorical feature which i have encoded in the following way:

df['categorical_feature'] = df['categorical_feature'].astype('category')
df['labels'] = df['categorical_feature'].cat.codes

If I apply the same code as above on another dataFrame with same category field the mapping is shuffled, but i need it to be consistent with the first dataFrame.

Is there a way to successfully apply the same mapping category:label to another dataFrame that has the same categorical values?


Solution

  • I think you are looking for pd.Series.map(), which maps values from category to label using a dictionary that has category: label mappings.

    Create mapping dictionary: You can do this using a dictionary comprehension in combination with zip, but there also other ways of doing this:

    col = 'categorical_features'
    mapping_dict = {k: v for k, v in zip(df[col], df[col].cat.codes}
    

    Now you can map that category: label mapping:

    df['labels'] = df['categorical'].map(mapping_dict)