Search code examples
tensorflowcosine-similarity

How can I get the cosine similarity of all elements of an array with all the other elements in the same array using Tensorflow


Given an array of sentence embeddings (arrays of 512) with a shape of (1000000, 512) how do I calculate the cosine similarity of every one of the 1 million sentence embeddings of the array against every other sentence embedding of the array, ideally using tensorflow, so I can try and speed it up with a GPU?


Solution

  • in this way you can calculate the cosine distance

    X = np.random.uniform(0,10, (100,512)).astype('float32')
    X = tf.constant(X)
    
    def compute_cosine_distances(a, b):
    
        normalize_a = tf.nn.l2_normalize(a,1)        
        normalize_b = tf.nn.l2_normalize(b,1)
        distance = 1 - tf.matmul(normalize_a, normalize_b, transpose_b=True)
    
        return distance
    
    compute_cosine_distances(X, X)
    

    which is equal to

    from sklearn.metrics.pairwise import pairwise_distances
    
    pairwise_distances(X.numpy(), metric='cosine')