I have a question concerning the implementation of a correlation-based loss function for a sequence labelling task in Keras (Tensorflow backend).
Consider we have a sequence labelling problem, e.g., the input is a tensor of shape (20,100,5), the output is a tensor of shape (20,100,1). In the documentation it is written that, the loss function needs to return a "scalar for each data point". What the default MSE loss does for the loss between tensors of shape (20,100,1) is to return a loss tensor of shape (20,100).
Now, if we use a loss function based on the correlation coefficient for each sequence, in theory, we will get only a single value for each sequence, i.e., a tensor of shape (20,).
However, using this in Keras as a loss function, fit() returns an error as a tensor of shape (20,100) is expected. On the other side, there is no error when I either
The framework does not return an error (Tensorflow backend) and the loss is reduced over epochs, also on independent test data, the performance is good.
My questions are:
Please find below an executable example with my implementations of correlation-based loss functions. my_loss_1 returns only the mean value of the correlation coefficients of all (20) sequences. my_loss_2 returns only one loss for each sequence (does not work in a real training). my_loss_3 repeats the loss for each sample within each sequence.
Many thanks and best wishes
from keras import backend as K
from keras.losses import mean_squared_error
import numpy as np
import tensorflow as tf
def my_loss_1(seq1, seq2): # Correlation-based loss function - version 1 - return scalar
seq1 = K.squeeze(seq1, axis=-1)
seq2 = K.squeeze(seq2, axis=-1)
seq1_mean = K.mean(seq1, axis=-1, keepdims=True)
seq2_mean = K.mean(seq2, axis=-1, keepdims=True)
nominator = K.sum((seq1-seq1_mean) * (seq2-seq2_mean), axis=-1)
denominator = K.sqrt( K.sum(K.square(seq1-seq1_mean), axis=-1) * K.sum(K.square(seq2-seq2_mean), axis=-1) )
corr = nominator / (denominator + K.common.epsilon())
corr_loss = K.constant(1.) - corr
corr_loss = K.mean(corr_loss)
return corr_loss
def my_loss_2(seq1, seq2): # Correlation-based loss function - version 2 - return 1D array
seq1 = K.squeeze(seq1, axis=-1)
seq2 = K.squeeze(seq2, axis=-1)
seq1_mean = K.mean(seq1, axis=-1, keepdims=True)
seq2_mean = K.mean(seq2, axis=-1, keepdims=True)
nominator = K.sum((seq1-seq1_mean) * (seq2-seq2_mean), axis=-1)
denominator = K.sqrt( K.sum(K.square(seq1-seq1_mean), axis=-1) * K.sum(K.square(seq2-seq2_mean), axis=-1) )
corr = nominator / (denominator + K.common.epsilon())
corr_loss = K.constant(1.) - corr
return corr_loss
def my_loss_3(seq1, seq2): # Correlation-based loss function - version 3 - return 2D array
seq1 = K.squeeze(seq1, axis=-1)
seq2 = K.squeeze(seq2, axis=-1)
seq1_mean = K.mean(seq1, axis=-1, keepdims=True)
seq2_mean = K.mean(seq2, axis=-1, keepdims=True)
nominator = K.sum((seq1-seq1_mean) * (seq2-seq2_mean), axis=-1)
denominator = K.sqrt( K.sum(K.square(seq1-seq1_mean), axis=-1) * K.sum(K.square(seq2-seq2_mean), axis=-1) )
corr = nominator / (denominator + K.common.epsilon())
corr_loss = K.constant(1.) - corr
corr_loss = K.reshape(corr_loss, (-1,1))
corr_loss = K.repeat_elements(corr_loss, K.int_shape(seq1)[1], 1) # Does not work for fit(). It seems that NO dimension may be None in order to get a value!=None from int_shape().
return corr_loss
# Test
sess = tf.Session()
# input (20,100,1)
a1 = np.random.rand(20,100,1)
a2 = np.random.rand(20,100,1)
print('\nInput: ' + str(a1.shape))
p1 = K.placeholder(shape=a1.shape, dtype=tf.float32)
p2 = K.placeholder(shape=a1.shape, dtype=tf.float32)
loss0 = mean_squared_error(p1,p2)
print('\nMSE:') # output: (20,100)
print(sess.run(loss0, feed_dict={p1: a1, p2: a2}))
loss1 = my_loss_1(p1,p2)
print('\nCorrelation coefficient:') # output: ()
print(sess.run(loss1, feed_dict={p1: a1, p2: a2}))
loss2 = my_loss_2(p1,p2)
print('\nCorrelation coefficient:') # output: (20,)
print(sess.run(loss2, feed_dict={p1: a1, p2: a2}))
loss3 = my_loss_3(p1,p2)
print('\nCorrelation coefficient:') # output: (20,100)
print(sess.run(loss3, feed_dict={p1: a1, p2: a2}))
Now, if we use a loss function based on the correlation coefficient for each sequence, in theory, we will get only a single value for each sequence, i.e., a tensor of shape (20,).
That's not true. the coefficient is something like
average((avg_label - label_value)(average_prediction - prediction_value)) /
(var(label_value)*var(prediction_value))
Remove the overall average and you are left the componenets of the correlation coefficient, per element of the sequence, which is the right shape. You can plug in other correlation formulas as well, just stop before computing the single value.