Search code examples
pythontensorflowkerasautoencoderloss-function

Keras built-in MSE loss on 2D data returns 2D matrix, not scalar loss


I'm trying to evaluate the MSE loss for single 2D test samples with an autoencoder (AE) in Keras once the model is trained and I'm surprised that when I call Keras MSE built-in function to get individual samples' loss it returns 2D tensors. That means the loss function computes one loss per pixel for each sample, and not one loss per sample as it should (?). To be perfectly clear, I expected MSE to associate to each 2D sample the mean of the squared errors computed over all pixels (as I've read on this SO post).

Since I didn't manage to get an array of scalar MSE errors with one scalar per test sample after training my AE using .predict() and .evaluate() (perhaps I missed something there as well), I went on trying to directly use keras.losses.mean_squared_error(), sample by sample. This returned me 2D tensors as a loss for each sample (input tensors are of size (N,M,1)). When one looks at Keras' original implementation of MSE loss, one finds:

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true), axis=-1)

The axis=-1 explains why multiple dimensions aren't immediately reduced to a scalar when computing the loss.

I therefore wonder:

  1. What exactly has my model been using during training ? Was it the mean of squared error over all pixels for each sample as I expected ? This isn't what the built-in code suggests.
  2. Do I absolutely need to re-define the MSE loss to get the individual MSE losses for each test sample ? To obtain a scalar I would then have to flatten the samples and the associated predictions, and then re-apply the built-in MSE (and this sample by sample).

Manually flattening before computing MSE seems what needs to be done according to this SO answer on Keras' MSE loss. Using MSE for an AE model with 2D data seemed fine to me as I read this keras.io Mnist denoising tutorial.

My code:

import keras

AE_testOutputs = autoencoder.predict(samplesList)

samplesMSE = []
for testSampleIndex in range(samplesList.shape[0]):
    AE_output = AE_testOutputs[testSampleIndex,:,:,:]
    samplesMSE.append(keras.losses.mean_squared_error(samplesList[testSampleIndex,:,:,:],AE_output))

Which returns a list samplesMSE of Tensor("Mean:0", shape=(15, 800), dtype=float64) objects.

I'm sorry if I missed a similar question, I did actively research before posting, and I'm still afraid there is a very simple explanation/I must have missed a built-in function somewhere.


Solution

  • Although it is not absolutely required, Keras loss functions are conventionally defined "per-sample", where "sample" is basically each element in the output tensor of the model. The loss function is then pass through a wrapping function weighted_masked_objective that adds support for masking and sample weighting. By default, the total loss is the average of the samples losses.

    If you want to get the mean of some value across every dimension but the first one, you can simply use K.mean over the value that you get.