Given an array of sentence embeddings (arrays of 512) with a shape of (1000000, 512) how do I calculate the cosine similarity of every one of the 1 million sentence embeddings of the array against every other sentence embedding of the array, ideally using tensorflow, so I can try and speed it up with a GPU?
in this way you can calculate the cosine distance
X = np.random.uniform(0,10, (100,512)).astype('float32')
X = tf.constant(X)
def compute_cosine_distances(a, b):
normalize_a = tf.nn.l2_normalize(a,1)
normalize_b = tf.nn.l2_normalize(b,1)
distance = 1 - tf.matmul(normalize_a, normalize_b, transpose_b=True)
return distance
compute_cosine_distances(X, X)
which is equal to
from sklearn.metrics.pairwise import pairwise_distances
pairwise_distances(X.numpy(), metric='cosine')