I have a simple Keras sequential model. I have N categories and i have to predict in which category the next point will fall based on the previous one.
The weird thing is that when i remove the Softmax activation function from the output layer the performance are better (lower loss and highest sparse_categorical_accuracy). As loss i'm using the sparse_categorical_crossentropy with logits=True.
Is there any reason for that? Should not be the opposite?
Thank you in advance for any suggestion!
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size, activation='softmax')
])
return model
model = build_model(
vocab_size = vocab_size,
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
model.compile(optimizer='adam', loss=loss, metrics=['sparse_categorical_accuracy'])
EPOCHS = 5
history = model.fit(train_set, epochs=EPOCHS, validation_data=val_set,)
In a nutshell, when you are using the option from_logits = True
, you are telling the loss function that your neural network output is not normalized. Since you are using softmax activation in your last layer, your outputs are indeed normalized, so you have two options:
from_logits = False
.