Search code examples
pythonkerasone-hot-encoding

one-hot encoding of an array of floats using just keras


First of, I am new to stackoverflow, so if there is a way to improve the way I formulate my question or if I missed something obvious, do point it out to me please!

I am building a classification convolutional network in Keras, where the network is asked to predict parameter was used to generate the image. The classes are encoded in 5 float values, e.g. a list of the classes may look like this:

[[0.], [0.76666665], [0.5], [0.23333333], [1.]]

I want to one-hot encode these classes, using the keras.utils.to_categorical(y, num_classes=5, dtype='float32') function.

However, it returns the following:

array(
    [
       [1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.]
    ], 
dtype=float32)

It only takes integers as input, thus it maps all values < 1. to 0. I could circumvent this by multiplying all values with a constant so they are all integers and I think there is also a way to solve this problem within scikit learn, but that sounds like a huge work-around for a problem that should be trivial to solve within just keras, which makes me believe I am missing something obvious.

I hope somebody is able to point out a simple alternative using just Keras.


Solution

  • Due to the continuous nature of floating point values, it's not advisable to try and one hot encode them. Instead, you should try something like this:

    a = {}
    classes = []
    
    for item, i in zip(your_array, range(len(your_array))):
        a[str(i)] = item
        classes.append(str(i))
    
    encoded_classes = to_categorical(classes)
    

    The dictionary is so that you can refer to actual values later.

    EDIT: Updated after comment from nuric.

    your_array = [[0.], [0.76666665], [0.5], [0.23333333], [1.]]
    
    class_values = {}
    classes = []
    
    for i, item in enumerate(your_array):
        class_values[str(i)] = item
        classes.append(i)
    
    encoded_classes = to_categorical(classes)