Why are there 3 losses BUT 2 accuracies in Keras LSTM training?

My model is like this:

def _get_model(input_shape, latent_dim, num_classes):

  inputs = Input(shape=input_shape)
  lstm_lyr,state_h,state_c = LSTM(latent_dim,dropout=0.1,return_state = True)(inputs)
  fc_lyr = Dense(num_classes)(lstm_lyr)
  soft_lyr = Activation('relu')(fc_lyr)
  model = Model(inputs, [soft_lyr,state_c])
  model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
return model
model =_get_model((n_steps_in, n_features),latent_dim ,n_steps_out)
history = model.fit(X_train,Y_train)

during training I get:

Epoch 1/2000
1/1 [==============================] - 1s 698ms/step - loss: 0.2338 - activation_26_loss: 0.1153 - lstm_151_loss: 0.1185 - activation_26_accuracy: 0.0000e+00 - lstm_151_accuracy: 0.0000e+00 - val_loss: 0.2341 - val_activation_26_loss: 0.1160 - val_lstm_151_loss: 0.1181 - val_activation_26_accuracy: 0.0000e+00 - val_lstm_151_accuracy: 0.0000e+00
Epoch 2/2000
1/1 [==============================] - 0s 34ms/step - loss: 0.2328 - activation_26_loss: 0.1153 - lstm_151_loss: 0.1175 - activation_26_accuracy: 0.0000e+00 - lstm_151_accuracy: 0.0000e+00 - val_loss: 0.2329 - val_activation_26_loss: 0.1160 - val_lstm_151_loss: 0.1169 - val_activation_26_accuracy: 0.0000e+00 - val_lstm_151_accuracy: 0.0000e+00
Epoch 3/2000
1/1 [==============================] - 0s 38ms/step - loss: 0.2316 - activation_26_loss: 0.1153 - lstm_151_loss: 0.1163 - activation_26_accuracy: 0.0000e+00 - lstm_151_accuracy: 0.0000e+00 - val_loss: 0.2315 - val_activation_26_loss: 0.1160 - val_lstm_151_loss: 0.1155 - val_activation_26_accuracy: 0.0000e+00 - val_lstm_151_accuracy: 0.0000e+00

when i see history:

 print (history.history.keys)
    dict_keys(['loss', 'activation_26_loss', 'lstm_151_loss', 'activation_26_accuracy', 'lstm_151_accuracy', 'val_loss', 'val_activation_26_loss', 'val_lstm_151_loss', 'val_activation_26_accuracy', 'val_lstm_151_accuracy'])

which ones are the training loss and training accuracy?
Since there are only 2 outputs, why are there 3 losses,loss,activation_26_lossand lstm_151_loss BUT 2 accuracies:activation_26_accuracy and lstm_151_accuracy? what is each loss and each accuracy standing for?

Solution

TLDR;

Three losses (2+1), two losses for individual outputs, and one as the combination of the 2 losses weighed by 0.5 each. You can set both the losses explicitly and their weights as well.
Two accuracies since there are 2 outputs. metrics are just for the user to view and don't affect the neural network.

Detailed explanation;

Let's try to see what you are doing here first. (I am referring to the previous question you asked to get the shapes for inputs.

from tensorflow.keras import layers, Model, utils

def _get_model(input_shape, latent_dim, num_classes):
    inputs = layers.Input(shape=input_shape)
    lstm_lyr,state_h,state_c = layers.LSTM(latent_dim,dropout=0.1,return_state = True)(inputs)
    fc_lyr = layers.Dense(num_classes)(lstm_lyr)
    soft_lyr = layers.Activation('relu')(fc_lyr)
    model = Model(inputs, [soft_lyr,state_c])   #<------- One input, 2 outputs
    model.compile(optimizer='adam', loss='mse')
    return model


#Dummy data
X = np.random.random((100,15,5))
y1 = np.random.random((100,4))
y2 = np.random.random((100,7))

model =_get_model((15, 5), 7 , 4)

You are building a supervised model that takes an input of (15,5) shape and outputs 2 things: first a (7,) which should contain the cell_states from the 7 LSTM cells and second a (4,) vector that should contain probability values for the 4 classes. The loss you are using to train the model for learning how to predict both of the outputs is mse.

Since this is a supervised model, you will have to provide the model samples of inputs and outputs. If you have 100 samples then your inputs would be (100,15,5) shaped and your outputs will be (100,7) and (100,4), since you have 2 outputs.

Loss(y_actual, y_pred) is a function that tells the neural network how far is its prediction from the actual value. Based on this, it tells the neural network to update itself (its weights specifically using backpropagation) so that its predictions become closer and closer to actual and thus reduce the Loss.

If the above points are clear then let's look at what this network is doing specifically

Your current model has one input and 2 outputs.

model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])

Since you have defined mse as loss, both the outputs are trying to minimize mse. These are the 2 losses out of the 3: activation_26_loss which is the loss for the final Dense layer and lstm_151_loss which is the loss from the LSTM cell state. Keras just gives random names to these layers with numbers unless specified properly.
The loss mentioned is basically the weighted average of the other 2 losses. Ill talk about this more later.
The metrics=['accuracy'] is just a metric for users to track. Since there are 2 outputs, you get 2 different accuracy metrics, one for each output. They don't affect the neural network's training.

Now, when working with neural networks, it's important to know which loss to use where. Here is a table describing what loss and activation functions to use for which type of network.

As you can see, it's a good practice to use softmax and categorical_crossentropy for multi-class problems. So let's try to recreate the model with this change. We want each output to have a different loss to minimize.

Also, let's say the first output is more important than the second. We can also tell the model how to weigh the losses so that it prioritizes which loss to focus on more and by how much.

from tensorflow.keras import layers, Model, utils

def _get_model(input_shape, latent_dim, num_classes):
    inputs = layers.Input(shape=input_shape)
    lstm_lyr,state_h,state_c = layers.LSTM(latent_dim,dropout=0.1,return_state = True)(inputs)
    fc_lyr = layers.Dense(num_classes)(lstm_lyr)
    soft_lyr = layers.Activation('softmax')(fc_lyr)
    model = Model(inputs, [soft_lyr,state_c])             #<--- Softmax for first outputs activation
    model.compile(optimizer='adam',                      
                  loss=['categorial_crossentropy','mse'], #<--- 2 losses, one for each output
                  loss_weights=[0.4, 0.6])                #<--- 2 loss weights for final loss
    return model

#Dummy data
X = np.random.random((100,15,5))
y1 = np.random.random((100,4))
y2 = np.random.random((100,7))

model =_get_model((15, 5), 7 , 4)
utils.plot_model(model, show_layer_names=False, show_shapes=True)

Here, the final loss (named simply as loss) is the combination of the 2 separate losses after combining them with 0.4 and 0.6 weights.

Hope this clarifies what you are trying to achieve.

ONE A SIDE NOTE: I am curious as to how you are getting the actual values for the final cell state to train the model to predict a cell state. Do let me know if that is what your intention is. It's not very clear what your final goal here is (as I had asked your previous question as well).