Search code examples
pythonlistconv-neural-networkone-hot-encoding

One Hot Encoding list of strings


I have a list of strings which serve as labels for my classification problem (image recognition with a Convolutional Neural Network). These labels consist of 5-8 characters (numbers from 0 to 9 and letters from A to Z). To train my neural network I would like to one hot encode the labels. I wrote a code to encode one label but I am still experiencing difficulties when trying to apply the code to a list.

Here is my code for one label which works fine:

from numpy import argmax
# define input string
data = '7C24698'
print(data)
# define universe of possible input values
characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ '
# define a mapping of chars to integers
char_to_int = dict((c, i) for i, c in enumerate(characters))
int_to_char = dict((i, c) for i, c in enumerate(characters))
# integer encode input data
integer_encoded = [char_to_int[char] for char in data]
print(integer_encoded)
# one hot encode
onehot_encoded = list()
for value in integer_encoded:
    character = [0 for _ in range(len(characters))]
    character[value] = 1
    onehot_encoded.append(character)
print(onehot_encoded)
# invert encoding
inverted = int_to_char[argmax(onehot_encoded[0])]
print(inverted)

I now want to get the same output for list of labels and store the output in a new list:

list_of_labels = ['7C24698', 'NDK745']
encoded_labels = []

How can I do this?


Solution

  • you can make a function with your working code and then use the built-in function map to apply for each element from your lists_of_labels your one-hot encoding function:

    from numpy import argmax
    # define input string
    
    def my_onehot_encoded(data):
        # define universe of possible input values
        characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ '
        # define a mapping of chars to integers
        char_to_int = dict((c, i) for i, c in enumerate(characters))
        int_to_char = dict((i, c) for i, c in enumerate(characters))
        # integer encode input data
        integer_encoded = [char_to_int[char] for char in data]
        # one hot encode
        onehot_encoded = list()
        for value in integer_encoded:
            character = [0 for _ in range(len(characters))]
            character[value] = 1
            onehot_encoded.append(character)
    
        return onehot_encoded
    
    
    list_of_labels = ['7C24698', 'NDK745']
    encoded_labels = list(map(my_onehot_encoded, list_of_labels))