tflearn label encoding with large number of classes

I am trying to adapt the Convolutional Neural Net example of tflearn to do a classification with ~12000 distinct class labels and more than 1 million training examples. The number of labels is apparently a problem in terms of memory consumption when one-hot encoding them. I first map my string labels to continuous integers, I then pass these as a list to the to_categorical() function. The following code leads to a MemoryError:

trainY = to_categorical(trainY, nb_classes=n_classes)

Do I have to encode the labels like this or should I use a different loss function than cross-entropy? Can I train in batches with tflearn - can I pass a generator to the DNN.fit() function?

Thanks for any advice!

Solution

In the regression layer link, you can specify that the labels that are feed in should be one-hot encoded on the run

tflearn.layers.regression(incoming_net,
                          loss = 'categorical_crossentropy',
                          batch_size = 64,
                          to_one_hot = True,
                          n_classes = 12000)

In this way you should not have a memory error, because labels will be encoded in batches while training.