Let`s say I have a matrix like this:
[[5.05537647 4.96643654 4.88792309 4.48089566 4.4469417 3.7841264]
[4.81800568 4.75527558 4.69862751 3.81999698 3.7841264 3.68258605]
[4.64717983 4.60021917 4.55716111 4.07718641 4.0245128 4.69862751]
[4.51752158 4.35840703 4.30839634 3.97312429 3.9655597 3.68258605]
[4.38592909 4.33261686 4.2856032 4.26411249 4.24381326 3.7841264]]
I need to calculate cosine similarity between rows of matrix but without using cosine similarity from "scipy" and "sklearn.metrics.pairwise". But I can use "math".
I tried it with this code, but I can`t understand how can I iterate over each row of matrix.
import math
def cosine_similarity(matrix):
for row1 in matrix:
for row2 in matrix:
sum1, sum2, sum3 = 0, 0, 0
for i in range(len(row1)):
a = row1[i]; b = row2[i]
sum1 += a*a
sum2 += b*b
sum3 += a*b
return sum3 / math.sqrt(sum1*sum2)
cosine_similarity(matrix)
Do you have any ideas how can I do that? Thank you!
You can use the vectorized operation since you have a numpy matrix. Furthermore, math.sqrt
doesn't allow vectorized operation therefore, you can use np.sqrt
to vectorize the square root operation. Following is the code where you store the similarity indices in a list and return it.
import numpy as np
def cosine_similarity(matrix):
sim_index = []
for row1 in matrix:
for row2 in matrix:
sim_index.append(sum(row1*row2)/np.sqrt(sum(row1**2) * sum(row2**2)))
return sim_index
cosine_similarity(matrix)
# 1.0,0.9985287276116063,0.9943589065201967,0.9995100043150523,0.9986115804314727,0.9985287276116063,1.0,0.9952419798474134,0.9984515542959852,0.9957338741601842,0.9943589065201967,0.9952419798474134,1.0,0.9970632589904104,0.9962784686967592,0.9995100043150523,0.9984515542959852,0.9970632589904104,1.0,0.9992584450362125,0.9986115804314727,0.9957338741601842,0.9962784686967592,0.9992584450362125,1.0
Further short code using list comprehension
sim_index = np.array([sum(r1*r2)/np.sqrt(sum(r1**2) * sum(r2**2)) for r1 in matrix for r2 in matrix])
The final list is converted to array for reshaping for plotting purpose.
Visualizing the similarity matrix : Here since each row is completely identical to itself, the similarity index is 1 (yellow color). Hence the diagonal of the matrix plotted is fully yellow (index = 1).
import matplotlib.pyplot as plt
plt.imshow(sim_index.reshape((5,5)))
plt.colorbar()