Custom loss function works even though dimensions mismatch

I'm using Keras/TF with the following model:

conv = Conv2D(4, 3, activation = None, use_bias=True)(inputs)   
conv = Conv2D(2, 1, activation = None, use_bias=True)(conv)
model = Model(input = inputs, output = conv)
model.compile(optimizer=Adam(lr=1e-4), loss=keras.losses.mean_absolute_error)

In model.fit, I get an error saying:

ValueError: Error when checking target: expected conv2d_2 to have shape (300, 320, 2) but got array with shape (300, 320, 1)

This is as expected because the targets are single channel images whereas the last layer in the model has 2 channels.

What I don't understand is why when I use a custom loss function:

def my_loss2(y_true, y_pred):
    return keras.losses.mean_absolute_error(y_true, y_pred)

and compile the model:

model.compile(optimizer = Adam(lr=1e-4), loss=my_loss2)

it does work (or at least, not giving the error). Is there any kind of automatic conversion/truncation going on?

I'm using TF (CPU) 1.12.0, and Keras 2.2.2

Sincerely, Elad

Solution

Why is the behavior different for built-in and custom losses?

It turns out that Keras is performing an upfront shape check for built-in functions that are defined in the losses module.

In the source code of Model._standardize_user_data, which is called by fit, I found this comment:

# If `loss_fn` is not a function (e.g. callable class)
# or if it not in the `losses` module, then
# it is a user-defined loss and we make no assumptions
# about it.

In the code around that comment you can see that indeed, depending on the type of loss function (built-in or custom), the output shape is either passed to an inner call of standardize_input_data or not. If the output shape is passed, standardize_input_data is raising the error message you are getting.

And I think this behavior makes some sense: Without knowing the implementation of a loss function, you cannot know its shape requirements. Someone may invent some loss function that needs different shapes. On the other hand, the docs clearly say that the loss function's parameters must have the same shape:

y_true: True labels. TensorFlow/Theano tensor.

y_pred: Predictions. TensorFlow/Theano tensor of the same shape as y_true.

So I find this a little inconsistent...

Why does your custom loss function work with incompatible shapes?

If you provide a custom loss, it may still work, even if the shapes do not perfectly match. In your case, where only the last dimension is different, I'm quite sure that broadcasting is what is happening. The last dimension of your targets will just be duplicated.

In many cases broadcasting is quite useful. Here, however, it is likely not since it hides a logical error.