Search code examples
pythonpandasdata-sciencecategorical-dataone-hot-encoding

Python how to inverse back the actual values after using one-hot-encode/pd.get_dummies


I am using

pd.get_dummies

to transform categorical vector with 4 labels (strings) to 2d array with 4 columns. However, after I coudln't find a way to go back to the original values afterwards. I also couldn't do this when using

sklearn.preprocessing.OneHotEncoder

What is the best wat to one-hot-encode categorcal vector but have the ability to inverse the original value afterwards?


Solution

  • You can make use of the inverse_transform method of sklearn.preprocessing.OneHotEncoder to do it. I have illustrated it with an example below:

    from sklearn.preprocessing import OneHotEncoder
    enc = OneHotEncoder(handle_unknown='ignore')
    X = [['Male'], ['Female'], ['Female']]
    enc.fit(X)
    enc.categories_
    
    [array(['Female', 'Male'], dtype=object)]
    
    enc.transform([['Female'], ['Male']]).toarray()
    
    array([[1., 0.],
           [0., 1.]])
    
    enc.inverse_transform([[0, 1], [1,0], [0, 1]])
    
    array([['Male'],
           ['Female'],
           ['Male']], dtype=object)
    

    To get the category-to-key dictionary you could do this:

    A = {}
    for i in enc.categories_[0]:
        A[i] = enc.transform([[i]]).toarray()
    

    But there could be a better way for doing this.