Concatenate layer changes loss magnitude

I fit a Keras model with a Concatenate layer to implement a compound loss. But even when I just ignore one of the two merged components, my loss is significantly higher than the remaining component taken alone.

Or maybe there's some bug in my code...

do you have any clue please ? thanks!

Some context

I my real setup, I have two input sets (X1,X2) with two corresponding label sets (Y,Z), flowing through the same model. The model must minimize binary_crossentropy over (X1,Y) and maximize conditional entropy over (X2,Z) subject to equality constraint on Y-predictions. For this I merge the two paths X1-Y and X2-Z with a Concatenate layer and define the corresponding custom loss. But even when I just ignore the Z-part in the compound loss, I get very different loss values compared to the basic 1-input/1-output (X1-Y) path.

Here some (simplified) code to reproduce the problem :


from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Input, Lambda, concatenate
from keras.optimizers import Adam, SGD
import keras.backend as K
import numpy as np


# Define a stupid custom loss on z-labels
def loss1(z, zhat):
    return K.sum(K.square(z-zhat), axis=-1)

# Another stupid custom loss on (y,z)-labels that just ignores y then forward to loss1
def loss2(yz, yzhat):
    z=yz[:,1]
    zhat=yzhat[:,1]
    return loss1(z, zhat)


# Toy dataset
X = np.random.rand(1000,100)
X2 = X

y = 1* X[:,0]>0.5
z = 1* X[:,1]>0.5

# Model
model = Sequential()
model.add(Dense(30, input_shape=[X.shape[1]], activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# 2 inputs (X,X2) , 2 outputs (Y,Z)
inY = Input([X.shape[1]], name="X")
outY = Lambda(lambda x: x, name="Y")(model(inY))
inZ = Input([X2.shape[1]], name="X2")
outZ = Lambda(lambda x: x, name="Z")(model(inZ))

# Take a 3rd output YZ by concatenating Y and Z
full_model = Model(inputs=[inY, inZ], outputs=[outY, outZ, concatenate([outY,outZ], name='YZ'), ])

# Run model with loss1 on Z and loss2 on YZ
full_model.compile(optimizer="adam",
    loss={'Y':"binary_crossentropy", 'Z':loss1, 'YZ': loss2},
    loss_weights={'Y':1, 'Z':0, 'YZ':0})
full_model.fit([X,X2], [y,z, np.stack((y,z),axis=-1)],    batch_size=32, epochs=100,  verbose=1)


# Z_loss1 and YZ_loss2 should be equal ! ...  ??? but got
# > Z_loss: 0.2542 - YZ_loss: 8.3113
# > Z_loss: 0.2519 - YZ_loss: 8.2832
# > Z_loss: 0.2523 - YZ_loss: 8.2477
# > Z_loss: 0.2598 - YZ_loss: 8.2236
# > ...

Z_loss1 and YZ_loss2 should be equal

but above code yields

Z_loss: 0.2542 - YZ_loss: 7.9963

Z_loss: 0.2519 - YZ_loss: 7.4883

Z_loss: 0.2523 - YZ_loss: 7.1448

Z_loss: 0.2598 - YZ_loss: 6.9451

Z_loss: 0.2583 - YZ_loss: 6.6104

Z_loss: 0.2621 - YZ_loss: 6.2509

Solution

The loss function is called with a 2D tensor - samples x outputs. The loss function then computes the loss for each sample in the batch and returns it separately.

z=yz[:,1] - here you convert the 2D tensor to a 1D and then loss1 sums the loss of the entire batch instead of per sample.

If you preserve the tensor dimensionality:

z=yz[:,1:]
zhat=yzhat[:,1:]

Then the YZ loss matches the Y loss exactly:

Epoch 1/5
1000/1000 [==============================] - 1s 1ms/step - loss: 0.7100 - Y_loss: 0.7100 - Z_loss: 0.2617 - YZ_loss: 0.2617