I have a tensor like this:
tf_a1 = [[-0.65 0. 0. 0. 0.42 0. 0. 0.51 0. 0.34 0.]
[0. -0.51 0. 0. -0.52 0. 0. 0. 0.53 0.42 0.]
[0. 0.32 0. -0.50 0.34 0. 0. 0.39 0.32 0.52 0.]
[0. 0.23 0.37 0. 0. 0.37 0.37 0. 0.47 0.39 0.3 ]]
I want to apply cosine similarity
over each column of this tensor. So, I want to find the similarity of the first column versus rest of the columns. Again, second column against rest of the columns and so on.
I have done this using the for loop as such:
def cosine_score(x):
for i, arr in enumerate(x):
if i == 0 :
first = cosine_similarity(x[i,].reshape(1, -1), x)
second = cosine_similarity(x[i,].reshape(1, -1), x)
final = tf.concat((first, second), axis=0)
first = final
return final
sim_topics = cosine_score(tf_a1)
Now, When I want to include this in my model, I can not use foor loop as it is. seems I have to use tf.map_fn
to go over it.
I also have done like this:
def cosine_score(x):
def cos_similarity(col):
for i, arr in enumerate(col):
if i == 0:
first = cosine_similarity(col[i, ].reshape(1, -1), col)
second = cosine_similarity(col[i, ].reshape(1, -1), col)
final = tf.concat((first, second), axis=0)
first = final
return final
sim = tf.map_fn(cos_similarity, x, dtype=tf.float32)
return sim
But here I need to remove the for loop
. My problem is that if I remove for loop
and access each column seperately, how can I access the rest of the columns to compare and apply cosine similarity
Please let me know if its not clear.
Cosine similarity is nothing more than an L2 normalized dot product. So, in Tensorflow
this should do the trick for you:
# Normalize the columns of the tensor
normalized_tensor = tf.math.l2_normalize(tf_a1, axis=0)
# Get the dot product between the columns
scores = tf.matmul(normalized_tensor, normalized_tensor, transpose_a=True)
The tensor scores
contains the cosine similarity between the columns of tf_a1
. Moreover, below is a Numpy
equivalent implementation:
# Normalize the columns of the tensor
normalized_tensor = tf_a1 / np.linalg.norm(tf_a1, axis=0)
# Get the dot product between the columns
scores = np.dot(normalized_tensor.T, normalized_tensor)
Finally, if you want to keep only one of the triangles (for example the upper triangle), and set the main diagonal to 0
, you can do the following in Tensorflow
zero_diag = tf.linalg.set_diag(scores, tf.zeros(tf.shape(scores)[0]))
triangular = tf.matrix_band_part(zero_diag, 0, -1)