neural-network keras keras-layer loss-function cross-entropy

How is Cross Entropy Loss Converted to a Scalar During Optimization?

I have a basic beginner question about how neural networks are defined, and I am learning in the context of the Keras library. Following the MNIST hello world program, I have defined this network:

model = Sequential()
model.add(Dense(NB_CLASSES, input_shape=(RESHAPED,), activation='softmax'))

My understanding is that that this creates a neural network with two layers, in this case RESHAPED is 784, and NB_CLASSES is 10, so the network will have 1 input layer with 785 neurons and one output layer with 10 neurons.

Then I added this:

model.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=['accuracy'])

I understand have read up on the formula for categorical cross entropy, but it appears to be calculated per output node. My question is, during training, how would the values of the cross entropy be combined to create a scalar valued objective function? Is it just an average?

Solution

Keras computes the mean of the per-instance loss values, possibly weighted (see sample_weight_mode argument if you're interested).

Here's the reference to the source code: training.py. As you can see, the result value goes through K.mean(...), which ensures the result is a scalar.

In general, however, it is possible to reduce the losses differently, e.g., just a sum, but it usually performs worse, so the mean is more preferable (see this question).