I have a basic beginner question about how neural networks are defined, and I am learning in the context of the Keras library. Following the MNIST hello world program, I have defined this network:
model = Sequential()
model.add(Dense(NB_CLASSES, input_shape=(RESHAPED,), activation='softmax'))
My understanding is that that this creates a neural network with two layers, in this case RESHAPED
is 784, and NB_CLASSES
is 10, so the network will have 1 input layer with 785 neurons and one output layer with 10 neurons.
Then I added this:
model.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=['accuracy'])
I understand have read up on the formula for categorical cross entropy, but it appears to be calculated per output node. My question is, during training, how would the values of the cross entropy be combined to create a scalar valued objective function? Is it just an average?
Keras computes the mean of the per-instance loss values, possibly weighted (see sample_weight_mode
argument if you're interested).
Here's the reference to the source code: training.py
. As you can see, the result value goes through K.mean(...)
, which ensures the result is a scalar.
In general, however, it is possible to reduce the losses differently, e.g., just a sum, but it usually performs worse, so the mean is more preferable (see this question).