I'm trying to evaluate the MSE loss for single 2D test samples with an autoencoder (AE) in Keras once the model is trained and I'm surprised that when I call Keras MSE built-in function to get individual samples' loss it returns 2D tensors. That means the loss function computes one loss per pixel for each sample, and not one loss per sample as it should (?). To be perfectly clear, I expected MSE to associate to each 2D sample the mean of the squared errors computed over all pixels (as I've read on this SO post).
Since I didn't manage to get an array of scalar MSE errors with one scalar per test sample after training my AE using .predict()
and .evaluate()
(perhaps I missed something there as well), I went on trying to directly use keras.losses.mean_squared_error()
, sample by sample. This returned me 2D tensors as a loss for each sample (input tensors are of size (N,M,1)
). When one looks at Keras' original implementation of MSE loss, one finds:
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
The axis=-1
explains why multiple dimensions aren't immediately reduced to a scalar when computing the loss.
I therefore wonder:
Manually flattening before computing MSE seems what needs to be done according to this SO answer on Keras' MSE loss. Using MSE for an AE model with 2D data seemed fine to me as I read this keras.io Mnist denoising tutorial.
My code:
import keras
AE_testOutputs = autoencoder.predict(samplesList)
samplesMSE = []
for testSampleIndex in range(samplesList.shape[0]):
AE_output = AE_testOutputs[testSampleIndex,:,:,:]
samplesMSE.append(keras.losses.mean_squared_error(samplesList[testSampleIndex,:,:,:],AE_output))
Which returns a list samplesMSE
of Tensor("Mean:0", shape=(15, 800), dtype=float64)
objects.
I'm sorry if I missed a similar question, I did actively research before posting, and I'm still afraid there is a very simple explanation/I must have missed a built-in function somewhere.
Although it is not absolutely required, Keras loss functions are conventionally defined "per-sample", where "sample" is basically each element in the output tensor of the model. The loss function is then pass through a wrapping function weighted_masked_objective
that adds support for masking and sample weighting. By default, the total loss is the average of the samples losses.
If you want to get the mean of some value across every dimension but the first one, you can simply use K.mean
over the value that you get.