Search code examples
pythonmachine-learningnlpone-hot-encoding

how to see the class name of one hot encoded?


I have a CSV file that includes two columns: a 'Text' of a tweet and its "label'. each tweet could belong to one of these 4 categories: Hate, Neutral, CounterHate and Non-Asian Aggression. I did One Hot Encode Y values for train and test vectors by the following code in Python:

encoder = LabelEncoder()
y_train = encoder.fit_transform(train['Label'].values)
y_train = to_categorical(y_train) 
y_test = encoder.fit_transform(test['Label'].values)
y_test = to_categorical(y_test)

which if you print the first index:

print(y_train[0])

The answer is:

[0. 1. 0. 0.]

We know that each Label is converted to a vector of length 4, where each position corresponds to a Label class. How can I find the position of each class?

For example: Hate=0, Counterhate=1,...


Solution

  • First, consider that the encoder class fits on the training set then transforms it, but only transforms the test set! I recommend using the method inverse_transform to retrieve your original labels.

    from sklearn import preprocessing
    le = preprocessing.LabelEncoder()
    le.fit(['Hate', 'Neutral', 'CounterHate and Non-Asian', 'Aggression'])
    print(list(le.classes_))
    print(le.transform(['CounterHate and Non-Asian', 'Hate', 'Neutral']))
    print(le.inverse_transform([2, 2, 1]))
    

    output:

    ['Aggression', 'CounterHate and Non-Asian', 'Hate', 'Neutral']
    [1 2 3]
    ['Hate' 'Hate' 'CounterHate and Non-Asian']