I have a dataFrame that contains a categorical feature which i have encoded in the following way:
df['categorical_feature'] = df['categorical_feature'].astype('category')
df['labels'] = df['categorical_feature'].cat.codes
If I apply the same code as above on another dataFrame with same category field the mapping is shuffled, but i need it to be consistent with the first dataFrame.
Is there a way to successfully apply the same mapping category:label
to another dataFrame that has the same categorical values?
I think you are looking for pd.Series.map()
, which maps values from category
to label
using a dictionary that has category: label
mappings.
Create mapping dictionary: You can do this using a dictionary comprehension in combination with zip
, but there also other ways of doing this:
col = 'categorical_features'
mapping_dict = {k: v for k, v in zip(df[col], df[col].cat.codes}
Now you can map that category: label
mapping:
df['labels'] = df['categorical'].map(mapping_dict)