python pandas data-science categorical-data one-hot-encoding

Python how to inverse back the actual values after using one-hot-encode/pd.get_dummies

I am using

pd.get_dummies

to transform categorical vector with 4 labels (strings) to 2d array with 4 columns. However, after I coudln't find a way to go back to the original values afterwards. I also couldn't do this when using

sklearn.preprocessing.OneHotEncoder

What is the best wat to one-hot-encode categorcal vector but have the ability to inverse the original value afterwards?

Solution

You can make use of the inverse_transform method of sklearn.preprocessing.OneHotEncoder to do it. I have illustrated it with an example below:

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [['Male'], ['Female'], ['Female']]
enc.fit(X)
enc.categories_

[array(['Female', 'Male'], dtype=object)]

enc.transform([['Female'], ['Male']]).toarray()

array([[1., 0.],
       [0., 1.]])

enc.inverse_transform([[0, 1], [1,0], [0, 1]])

array([['Male'],
       ['Female'],
       ['Male']], dtype=object)

To get the category-to-key dictionary you could do this:

A = {}
for i in enc.categories_[0]:
    A[i] = enc.transform([[i]]).toarray()

But there could be a better way for doing this.