python neural-network keras theano loss-function

Max-margin loss in Keras/theano

I want to train a neural network in Keras (with theano as backend) with a max-margin loss function using one negative sample per positive sample:

 max(0,1 -pos_score +neg_score)

I have a neural network which takes two arguments i and j and return score base(i,j). For given i, I have a positive sample j and negative sample k. So, I want to compute the following:

 max(0, 1 - base(i, j) + base(i, k))

At abstract level, my code looks like the following:

i = Input(...) # d=100
j = Input(...) # d=300
k = Input(...) # d=300

i_vec = Sequential()
i_vec.add(Dense(20, input_dim=100))
j_vec = Sequential()
j_vec.add(Dense(30, input_dim=300))

base = Sequential()
base.add(Merge([i_vec, j_vec], mode='concat')
# Here goes definition of the base network
base.add(Dense(output_dim=1, bias=False))

pos = base([i, j])
neg = base([i, k])

def custom_loss(y_true, y_pred):
    return K.maximum(0, 1 - y_pred[0] + y_pred[1])

model = Model(input=[i,j,k], output=[pos, neg])
# Shape of I=(1000,100), J and K=(1000,300), XX=(1000,)
model.fit([I, J, K], [XX,XX], nb_epoch=10)

Note that the XX is useless during the training.

While running the code, I got the following error:

ValueError: GpuElemwise. Output dimension mismatch. Output 0 (indices start at 0), working inplace on input 0, has shape[0] == 1, but the output's size on that axis is 32.
Apply node that caused the error: GpuElemwise{Composite{(i0 * (i1 * i2))}}[(0, 0)](GpuElemwise{Composite{Cast{float32}(EQ(i0, i1))}}[(0, 0)].0, GpuElemwise{Composite{(i0 / (i1 * i2))}}[(0, 0)].0, GpuFromHost.0)
Toposort index: 83
Inputs types: [CudaNdarrayType(float32, vector), CudaNdarrayType(float32, (True,)), CudaNdarrayType(float32, vector)]
Inputs shapes: [(1,), (1,), (32,)]
Inputs strides: [(0,), (0,), (1,)]
Inputs values: [CudaNdarray([ 1.]), CudaNdarray([ 1.]), 'not shown']
Outputs clients: [[GpuIncSubtensor{InplaceInc;int64}(GpuIncSubtensor{Inc;int64}.0, GpuElemwise{Composite{(i0 * (i1 * i2))}}[(0, 0)].0, Constant{1}), GpuElemwise{neg,no_inplace}(GpuElemwise{Composite{(i0 * (i1 * i2))}}[(0, 0)].0)]]

I think the problem is in the computation of the loss function.

Note: I have tried with the XX as raw vector and column vector. But, the error remains same.

Solution for the same problem with TensorFlow as backend is available here and here.

Edit 1:

Changing the loss function as below works (I mean it works without any error). But, neither I know why nor I know about the correctness of the new code.

def custom_loss(y_true, y_pred):
    return K.sum(K.maximum(0, 1 - y_pred[0] + y_pred[1]))

Solution

It seems like K.maximum(0, 1 - y_pred[0] + y_pred[1]) does not give you a scalar loss value, but rather the error per sample. You need to average the loss for the entire minibatch. Thus, using K.sum reduce the per-sample loss to a per-minibatch scalar loss. I suppose it would be more accurate to use mean instead of sum (in case you decide to change batch size).