Search code examples
pythonnumpyone-hot-encoding

One hot encoding from image labels using numpy


I have been puzzling over this one hot encoding problem. I am sure it is a simple process, but I have been looking at this problem for awhile and cannot see my mistake.

I have a set of train_labels of shape (1080,1), and there are 6 integer classes. I am trying to turn this into a one hot vector using the following:

def convert_to_one_hot(train_labels_conv,classes):
    Y_train = np.eye(classes)[train_labels_conv.reshape(-1)].T
    return Y_train

Y_train = np.arange(6)
print(Y_train)
Y_train_hot = convert_to_one_hot(Y_train, len(Y))
print(Y_train_hot)
As a result I simply get
[0 1 2 3 4 5]
[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]

Should I not have received the whole one hot matrix for my training labels? I would appreciate any instruction in the right direction as I am not yet comfortable using python.


Solution

  • If your labels are strings, you can use this function:

    import numpy as np
    
    target = np.array(['dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 
        'cat', 'cat', 'hamster', 'hamster'])
    
    def one_hot(array):
        unique, inverse = np.unique(array, return_inverse=True)
        onehot = np.eye(unique.shape[0])[inverse]
        return onehot
    
    print(one_hot(target))
    

    Out[9]:
    [[0., 1., 0.],
    [0., 1., 0.],
    [1., 0., 0.],
    [1., 0., 0.],
    [1., 0., 0.],
    [0., 1., 0.],
    [0., 1., 0.],
    [1., 0., 0.],
    [1., 0., 0.],
    [0., 0., 1.],
    [0., 0., 1.]])