Search code examples
pandascategorical-data

Reverse get_dummies()


My dataframe looks like this after converting categorical to numerical using get_dummies()

score1 score2  country_CN country _AU category_leader category_
0.89.   0.45.   0.         1.            0              1
0.55.   0.54     1.        0             1              0

As you can see the converted categorical to numerical columns are country_CN country _AU category_leader category_

I want to bring it to its's original dataframe something like this:

score1 score2  country category_leader 
0.89.   0.45.   AU                    
0.55.   0.54    CN            leader    

I have tried using the suggestion listed here: Reverse a get_dummies encoding in pandas

But no luck as of yet.

Any help/ clue?


Solution

  • You can convert for dummies columns to index first by DataFrame.set_index:

    #https://stackoverflow.com/a/62085741/2901002
    df = undummify(df.set_index(['score1','score2'])).reset_index()
    

    Or use alternative solution with DataFrame.melt, fiter rows with boolean indexing, splitting by Series.str.split and last pivoting by DataFrame.pivot:

    df1 = df.melt(['score1','score2'])
    df1 = df1[df1['value'].eq(1)]
    df1[['a','b']] = df1.pop('variable').str.split('_', expand=True)
    df1 = df1.pivot(index=['score1','score2'], columns='a', values='b').reset_index()
    print (df1)
    a  score1  score2 category country
    0    0.55    0.54   leader      CN
    1    0.89    0.45               AU