python machine-learning nlp one-hot-encoding

how to see the class name of one hot encoded?

I have a CSV file that includes two columns: a 'Text' of a tweet and its "label'. each tweet could belong to one of these 4 categories: Hate, Neutral, CounterHate and Non-Asian Aggression. I did One Hot Encode Y values for train and test vectors by the following code in Python:

encoder = LabelEncoder()
y_train = encoder.fit_transform(train['Label'].values)
y_train = to_categorical(y_train) 
y_test = encoder.fit_transform(test['Label'].values)
y_test = to_categorical(y_test)

which if you print the first index:

print(y_train[0])

The answer is:

[0. 1. 0. 0.]

We know that each Label is converted to a vector of length 4, where each position corresponds to a Label class. How can I find the position of each class?

For example: Hate=0, Counterhate=1,...

Solution

First, consider that the encoder class fits on the training set then transforms it, but only transforms the test set! I recommend using the method inverse_transform to retrieve your original labels.

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(['Hate', 'Neutral', 'CounterHate and Non-Asian', 'Aggression'])
print(list(le.classes_))
print(le.transform(['CounterHate and Non-Asian', 'Hate', 'Neutral']))
print(le.inverse_transform([2, 2, 1]))

output:

['Aggression', 'CounterHate and Non-Asian', 'Hate', 'Neutral']
[1 2 3]
['Hate' 'Hate' 'CounterHate and Non-Asian']