I am using
pd.get_dummies
to transform categorical vector with 4 labels (strings) to 2d array with 4 columns. However, after I coudln't find a way to go back to the original values afterwards. I also couldn't do this when using
sklearn.preprocessing.OneHotEncoder
What is the best wat to one-hot-encode categorcal vector but have the ability to inverse the original value afterwards?
You can make use of the inverse_transform
method of sklearn.preprocessing.OneHotEncoder
to do it. I have illustrated it with an example below:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [['Male'], ['Female'], ['Female']]
enc.fit(X)
enc.categories_
[array(['Female', 'Male'], dtype=object)]
enc.transform([['Female'], ['Male']]).toarray()
array([[1., 0.],
[0., 1.]])
enc.inverse_transform([[0, 1], [1,0], [0, 1]])
array([['Male'],
['Female'],
['Male']], dtype=object)
To get the category-to-key dictionary you could do this:
A = {}
for i in enc.categories_[0]:
A[i] = enc.transform([[i]]).toarray()
But there could be a better way for doing this.