I have a CSV file that includes two columns: a 'Text' of a tweet and its "label'. each tweet could belong to one of these 4 categories: Hate, Neutral, CounterHate and Non-Asian Aggression. I did One Hot Encode Y values for train and test vectors by the following code in Python:
encoder = LabelEncoder()
y_train = encoder.fit_transform(train['Label'].values)
y_train = to_categorical(y_train)
y_test = encoder.fit_transform(test['Label'].values)
y_test = to_categorical(y_test)
which if you print the first index:
print(y_train[0])
The answer is:
[0. 1. 0. 0.]
We know that each Label is converted to a vector of length 4, where each position corresponds to a Label class. How can I find the position of each class?
For example: Hate=0, Counterhate=1,...
First, consider that the encoder
class fits on the training set then transforms it, but only transforms the test set! I recommend using the method inverse_transform
to retrieve your original labels.
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(['Hate', 'Neutral', 'CounterHate and Non-Asian', 'Aggression'])
print(list(le.classes_))
print(le.transform(['CounterHate and Non-Asian', 'Hate', 'Neutral']))
print(le.inverse_transform([2, 2, 1]))
output:
['Aggression', 'CounterHate and Non-Asian', 'Hate', 'Neutral']
[1 2 3]
['Hate' 'Hate' 'CounterHate and Non-Asian']