Keras: handling batch size dimension for custom pearson correlation metric

I want to create a custom metric for pearson correlation as defined here

I'm not sure how exactly to apply it to batches of y_pred and y_true

What I did:

def pearson_correlation_f(y_true, y_pred):

    y_true,_ = tf.split(y_true[:,1:],2,axis=1)
    y_pred, _ = tf.split(y_pred[:,1:], 2, axis=1)

    fsp = y_pred - K.mean(y_pred,axis=-1,keepdims=True)
    fst = y_true - K.mean(y_true,axis=-1, keepdims=True)

    corr = K.mean((K.sum((fsp)*(fst),axis=-1))) / K.mean((
      K.sqrt(K.sum(K.square(y_pred - 
      K.mean(y_pred,axis=-1,keepdims=True)),axis=-1) * 
      K.sum(K.square(y_true - K.mean(y_true,axis=-1,keepdims=True)),axis=-1))))

return corr

Is it necessary for me to use keepdims and handle the batch dimension manually and the take the mean over it? Or does Keras somehow do this automatically?

Solution

When you use K.mean without an axis, Keras automatically calculates the mean for the entire batch.

And the backend already has standard deviation functions, so it might be cleaner (and perhaps faster) to use them.

If your true data is shaped like (BatchSize,1), I'd say keep_dims is unnecessary. Otherwise I'm not sure and it would be good to test the results.

(I don't understand why you use split, but it seems also unnecessary).

So, I'd try something like this:

fsp = y_pred - K.mean(y_pred) #being K.mean a scalar here, it will be automatically subtracted from all elements in y_pred
fst = y_true - K.mean(y_true)

devP = K.std(y_pred)
devT = K.std(y_true)

return K.mean(fsp*fst)/(devP*devT)

If it's relevant to have the loss for each feature instead of putting them all in the same group:

#original shapes: (batch, 10)

fsp = y_pred - K.mean(y_pred,axis=0) #you take the mean over the batch, keeping the features separate.   
fst = y_true - K.mean(y_true,axis=0) 
    #mean shape: (1,10)
    #fst shape keeps (batch,10)

devP = K.std(y_pred,axis=0)  
devt = K.std(y_true,axis=0)
    #dev shape: (1,10)

return K.sum(K.mean(fsp*fst,axis=0)/(devP*devT))
    #mean shape: (1,10), making all tensors in the expression be (1,10). 
    #sum is only necessary because we need a single loss value

Summing the result of the ten features or taking a mean of them is the same, being one 10 times the other (That is not very relevant to keras models, affecting only the learning rate, but many optimizers quickly find their way around this).