Most examples I've seen implement softmax on the last layer. But I read that Keras categorical_crossentropy
automatically applies softmax after the last layer so doing it is redundant and leads to reduced performance. Who is right?
By default, Keras categorical_crossentropy does not apply softmax to the output (see the categorical_crossentropy implementation and the Tensorflow backend call). However, if you use the backend function directly, there exists the option of setting from_logits=True
.