python tensorflow scikit-learn backpropagation cosine-similarity

Tensorflow implementation of NT_Xent contrastive loss function?

As the title suggests, I'm trying train a model based on the SimCLR framework (seen in this paper: https://arxiv.org/pdf/2002.05709.pdf - the NT_Xent loss is stated in equation (1) and Algorithm 1).

I have managed to create a numpy version of the loss function, but this is not suitable to train the model on, as numpy arrays cannot store the required information for back propagation. I am having difficulty converting my numpy code over to Tensorflow. Here is my numpy version:

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Define the contrastive loss function, NT_Xent
def NT_Xent(zi, zj, tau=1):
    """ Calculates the contrastive loss of the input data using NT_Xent. The
    equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
    
    Args:
        zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
        zj: Other half of the input data, must have the same shape as zi
        tau: Temperature parameter (a constant), default = 1.

    Returns:
        loss: The complete NT_Xent constrastive loss
    """
    z = np.concatenate((zi, zj), 0)

    loss = 0
    for k in range(zi.shape[0]):
        # Numerator (compare i,j & j,i)
        i = k
        j = k + zi.shape[0]
        sim_ij = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[j].reshape(1, -1)))
        sim_ji = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[i].reshape(1, -1)))
        numerator_ij = np.exp(sim_ij / tau)
        numerator_ji = np.exp(sim_ji / tau)

        # Denominator (compare i & j to all samples apart from themselves)
        sim_ik = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[np.arange(z.shape[0]) != i]))
        sim_jk = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[np.arange(z.shape[0]) != j]))
        denominator_ik = np.sum(np.exp(sim_ik / tau))
        denominator_jk = np.sum(np.exp(sim_jk / tau))

        # Calculate individual and combined losses
        loss_ij = - np.log(numerator_ij / denominator_ik)
        loss_ji = - np.log(numerator_ji / denominator_jk)
        loss += loss_ij + loss_ji
    
    # Divide by the total number of samples
    loss /= z.shape[0]

    return loss

I am fairly confident that this function produces the correct results (albeit slowly, as I have seen other implementations of it online that were vectorised versions - such as this one for Pytorch: https://github.com/Spijkervet/SimCLR/blob/master/modules/nt_xent.py (my code produces the same result for identical inputs), but I do not see how their version is mathematically equivalent to the formula in the paper, hence why I am trying to build my own).

As a first try I have converted the numpy functions to their TF equivalents (tf.concat, tf.reshape, tf.math.exp, tf.range, etc.), but I believe my only/main problem is that sklearn's cosine_similarity function returns a numpy array, and I do not know how to build this function myself in Tensorflow. Any ideas?

Solution

I managed to figure it out myself! I did not realise there was a Tensorflow implementation of the cosine similarity function "tf.keras.losses.CosineSimilarity"

Here is my code:

import tensorflow as tf

# Define the contrastive loss function, NT_Xent (Tensorflow version)
def NT_Xent_tf(zi, zj, tau=1):
    """ Calculates the contrastive loss of the input data using NT_Xent. The
    equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
    (This is the Tensorflow implementation of the standard numpy version found
    in the NT_Xent function).
    
    Args:
        zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
        zj: Other half of the input data, must have the same shape as zi
        tau: Temperature parameter (a constant), default = 1.

    Returns:
        loss: The complete NT_Xent constrastive loss
    """
    z = tf.cast(tf.concat((zi, zj), 0), dtype=tf.float32)
    loss = 0
    for k in range(zi.shape[0]):
        # Numerator (compare i,j & j,i)
        i = k
        j = k + zi.shape[0]
        # Instantiate the cosine similarity loss function
        cosine_sim = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
        sim = tf.squeeze(- cosine_sim(tf.reshape(z[i], (1, -1)), tf.reshape(z[j], (1, -1))))
        numerator = tf.math.exp(sim / tau)

        # Denominator (compare i & j to all samples apart from themselves)
        sim_ik = - cosine_sim(tf.reshape(z[i], (1, -1)), z[tf.range(z.shape[0]) != i])
        sim_jk = - cosine_sim(tf.reshape(z[j], (1, -1)), z[tf.range(z.shape[0]) != j])
        denominator_ik = tf.reduce_sum(tf.math.exp(sim_ik / tau))
        denominator_jk = tf.reduce_sum(tf.math.exp(sim_jk / tau))

        # Calculate individual and combined losses
        loss_ij = - tf.math.log(numerator / denominator_ik)
        loss_ji = - tf.math.log(numerator / denominator_jk)
        loss += loss_ij + loss_ji
    
    # Divide by the total number of samples
    loss /= z.shape[0]

    return loss

As you can see, I have essentially just swapped out the numpy functions for the TF equivalents. One main point of note is that I had to use "reduction=tf.keras.losses.Reduction.NONE" within the "cosine_sim" function, this was to keep the shapes consistent in the "sim_ik" and "sim_jk", because otherwise the resulting loss did not match up with my original numpy implementation.

I also noticed that individually calculating the numerator for i,j and j,i was redundant as the answers were the same, so I have removed one instance of that calculation.

Of course if anybody has a quicker implementation I am more than happy to hear about it!