Search code examples
pythonnumpylabelone-hot-encoding

How to reverse onehot-encoding?


I have some labels which look somehow like this: 'ABC1234'. I onehot-encoded them using this code:

from numpy import argmax
# define input string

def my_onehot_encoded(label):
    # define universe of possible input values
    characters = '0123456789ABCDEFGHIJKLMNPQRSTUVWXYZ'
    # define a mapping of chars to integers
    char_to_int = dict((c, i) for i, c in enumerate(characters))
    int_to_char = dict((i, c) for i, c in enumerate(characters))
    # integer encode input data
    integer_encoded = [char_to_int[char] for char in label]
    # one hot encode
    onehot_encoded = list()
    for value in integer_encoded:
        character = [0 for _ in range(len(characters))]
        character[value] = 1
        onehot_encoded.append(character)

    return onehot_encoded

I get onehot-encoded labels of shape (7, 35).

I then created a model which should predict the labels. I use this code to predict the label of one image:

from skimage.io import imread
from skimage.transform import resize
import numpy as np
import math

img = imread('/content/gdrive/My Drive/2017-IWT4S-CarsReId_LP-dataset/2_4.png')
img = resize(img,(224,224))
img = img*1./255
img = np.reshape(img,[1,224,224,3])

classes = model.predict(img)

np.argmax(classes, axis=2)

Which gives me a vector with the predicted classes. In the case of the label upon: array([[ 10, 11, 12, 1, 2, 3, 4]]) I now would like to get a function which decodes this array to my original string label 'ABC1234'. How could I do this?


Solution

  • Using a nested loop like this and adding one by one seems highly inefficient.

    A simple solution would be to just use entire output rows as indices.

    characters = '0123456789ABCDEFGHIJKLMNPQRSTUVWXYZ'
    characters = np.array(list(characters))
    outputs = np.array([[10, 11, 12, 1, 2, 3, 4]])
    labels = [''.join(characters[row]) for row in outputs]
    # ['ABC1234']