Search code examples
pythonpandasdataframedimensionality-reduction

How to reduce conditionality of a categorical feature using a lookup table


I a dataframe (df1) whose one categorical column is

df1=pd.Dataframe({'COL1': ['AA','AB','BC','AC','BA','BB','BB','CA','CB','CD','CE']})

I have another dataframe (df2) which has two columns

df2=pd.Dataframe({'Category':['AA','AB','AC','BA','BB','BC','CA','CB','CC','CD','CE','CF'],'general_mapping':['A','A','A','B','B','B','C','C','C','C','C','C']})

I need to modify df1 using df2 and finally will look like:

df1->> ({'COL1': ['A','A','B','A','B','B','B','C','C','C','C']})

Solution

  • You can use pd.Series.map after setting Category as index using df.set_index.

    df1['COL1'] = df1['COL1'].map(df2.set_index('Category')['general_mapping'])
    df1
       COL1
    0     A
    1     A
    2     B
    3     A
    4     B
    5     B
    6     B
    7     C
    8     C
    9     C
    10    C