python tensorflow keras neural-network cross-entropy

Units of the last dense output layer in case of multiple categories

I am currently working on this colab. Task is to classify the sentences into a certain category. So we have a multiple category problem, not binary, like prediction the sentiment of a review (positive / negative) according to certain review sentences. In case of multiple categories I thought that the number of units/neurons in the last layer has to match the number of classes I want to predict. So when I have a binary problem I use one neuron, indicating 0 or 1. When I have 5 classes, I need 5 units. That's what I thought.

However, in the code of the colab there is the following:

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(24, activation='relu'),
    tf.keras.layers.Dense(6, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

When I run the model.fit code part in this colab it does work. But I do not get it. When I check

print(label_tokenizer.word_index)
print(label_tokenizer.word_docs)
print(label_tokenizer.word_counts)

This gives

{'sport': 1, 'business': 2, 'politics': 3, 'tech': 4, 'entertainment': 5}
defaultdict(<class 'int'>, {'tech': 401, 'business': 510, 'sport': 511, 'entertainment': 386, 'politics': 417})
OrderedDict([('tech', 401), ('business', 510), ('sport', 511), ('entertainment', 386), ('politics', 417)])

So clearly 5 classes. However, when I adjust the model to tf.keras.layers.Dense(5, activation='softmax') and run the model.fit command it does not work. Accuracy is always 0.

Why is 6 here correct and not 5?

Solution

is 6 because the encode targets are in [1,5] but keras sparse_cat creates one-hot labels from 0 so it creates another unuseful label (0).

to use Dense(5, activation='softmax') you simply can do y-1 in order to get labels in [0,4] and get them starting from 0

following the colab link, you can change:

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(24, activation='relu'),
    tf.keras.layers.Dense(5, activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

history = model.fit(train_padded, training_label_seq-1, epochs=num_epochs, validation_data=(validation_padded, validation_label_seq-1), verbose=2)