Search code examples
pythontensorflowkerasloss-function

Keras Custom Loss Function InvalidArgumentError: In[1] is not a matrix. Instead it has shape []


I’m trying to use the Spearman rank correlation coefficient to write a custom loss function. I want to compute the Spearman rank correlation coefficient between each pair of y_true and y_pred samples (each sample is an array of 8 elements; e.g., [1 2 3 4 5 6 7 8] and [3 2 1 4 5 8 6 7]).

I have followed the indications of this answer (How to compute Spearman correlation in Tensorflow) and Keras documentation (https://keras.io/api/losses/), however there must be something I’m skipping with regard to the output shape of the computed loss.

Training the model with this custom function produces the following error:

model.compile(loss=spearman_correlation, optimizer=tf.keras.optimizers.Adam())
model.fit(train_x, train_y,batch_size=64, epochs=2, validation_data=(test_x, test_y), callbacks=[model_checkpoint])

InvalidArgumentError:  In[1] is not a matrix. Instead it has shape []
     [[node gradient_tape/model_19/dense_19/MatMul_1 (defined at <ipython-input-46-7e6fc7cd1b39>:12) ]] [Op:__inference_train_function_300522]

I have tried a tricky way to solve this, I use a working example of a Keras loss function and I simply modify the result with the values computed in my loss function. This way the training function works, however, I don’t think this is the way of doing things properly but I’m not seeing where is the problem. Looking at the outputs of the prints in the custom function, can be seen that the shape and type of my loss output object and the tensorflow's loss function output object are the same.

This is the way I’m computing the loss:

def get_rank(y_pred):
    temp = sorted(y_pred, reverse=False)
    res = [temp.index(i) for i in y_pred]
    res = np.array(res)+1
    return(res)

def custom_spearman_correlation(y_true, y_pred):
    s_coefs = tf.map_fn(lambda k: 1-stats.spearmanr(k[0], get_rank(k[1]))[0], tf.stack([y_true, y_pred], 1), dtype=tf.float32)

    loss = s_coefs
    print("CUSTOM LOSS: ")
    print("Shape: " + str(loss.shape))
    print(type(loss))

    print("WORKING LOSS")
    squared_difference = tf.square(y_true - y_pred)
    w_loss = tf.reduce_mean(squared_difference, axis=-1)
    print("Shape: " + str(w_loss.shape))
    print(type(w_loss))

    print("TRICKY ANSWER: ")
    t_loss = w_loss*0 + loss
    print("Shape: " + str(t_loss.shape))
    print(type(t_loss))
    return loss
    #return w_loss
    #return t_loss

def spearman_correlation(y_true, y_pred):
    sp = tf.py_function(custom_spearman_correlation, [tf.cast(y_true, tf.float32), tf.cast(y_pred, tf.float32)], Tout = tf.float32)
    return (sp)

And this is the output:

CUSTOM LOSS: 
Shape: (64,)
<class 'tensorflow.python.framework.ops.EagerTensor'>
WORKING LOSS
Shape: (64,)
<class 'tensorflow.python.framework.ops.EagerTensor'>
TRICKY ANSWER: 
Shape: (64,)

Solution

  • Although I'm not sure, I think that the above solution does not allow to update properly the weights of the different parameters in the model and thus my model was not learning. I have been working around to implement directly the Spearman rank correlation coefficient in tensorflow following the definition of this website (https://rpubs.com/aaronsc32/spearman-rank-correlation) and I have reached the following code (I share it just in case anyone found it useful).

    @tf.function
    def get_rank(y_pred):
      rank = tf.argsort(tf.argsort(y_pred, axis=-1, direction="ASCENDING"), axis=-1)+1 #+1 to get the rank starting in 1 instead of 0
      return rank
    
    @tf.function
    def sp_rank(x, y):
      cov = tfp.stats.covariance(x, y, sample_axis=0, event_axis=None)
      sd_x = tfp.stats.stddev(x, sample_axis=0, keepdims=False, name=None)
      sd_y = tfp.stats.stddev(y, sample_axis=0, keepdims=False, name=None)
      return 1-cov/(sd_x*sd_y) #1- because we want to minimize loss
    
    @tf.function
    def spearman_correlation(y_true, y_pred):
        #First we obtain the ranking of the predicted values
        y_pred_rank = tf.map_fn(lambda x: get_rank(x), y_pred, dtype=tf.float32)
        
        #Spearman rank correlation between each pair of samples:
        #Sample dim: (1, 8)
        #Batch of samples dim: (None, 8) None=batch_size=64
        #Output dim: (batch_size, ) = (64, )
        sp = tf.map_fn(lambda x: sp_rank(x[0],x[1]), (y_true, y_pred_rank), dtype=tf.float32)
        #Reduce to a single value
        loss = tf.reduce_mean(sp)
        return loss