I am doing a project about cosine similarity on movie dataset, I'm confused about the formula for calculating cosine similarity.
But I searched online, some articles show that the denominator is something like : sqrt(A1^2+B1^2) * sqrt(A2^2+B2^2) * ... * sqrt(Ai^2+Bi^2)
I'm confused, what's the difference? Which one is correct or they both are correct?
The one on your image is correct. In two dimensions, it is derived from the Law of cosines which relates the length of one side of a triangle to the length of the other two sides, and the angle opposite c, theta:
c^2==a^2+b^2-2*b*c(cos(theta))
You can prove this in many ways, and a good verification is to know that when cos(gamma)==0
(side a and b are orthogonal), you get the Pythagorean Theorem.
To get the formula on the image, you have to translate it into analytical geometry (vectors)
norm(A-B)^2==norm(A)^2+norm(B)^2−2*norm(A)*norm(B)*cos(theta)
and by using that norm(A-B)^2 is by definition (A-B)*(A-B) and expanding we get
norm(A-B)^2 ==norm(A)^2+norm(B)^2-2*A*B
So equating both expressions, and doing cancellations, yields
norm(A)*norm(B)*cos(theta) = A*B
which is the (rearranged) formula on your definition (and the norm(v) = sqrt(v*v)
). For n dimensions you can show this works because rotating the euclidean space preserves norm and inner product, and because the 2D plane spanned by the vectors is precisely just a rotation of the xy plane.
A good sanity check is, again that orthogonality yields a cosine of 0, and that the cosine is between 0 and 1 (this is the Cauchy Schwarz theorem)
Update: In the examples mentioned on your comment, you can see the results from the blog by running
import sklearn.metrics.pairwise as pw
print(pw.cosine_similarity([[4,3]],[[5,5]]))
print(pw.cosine_similarity([[4,3,5]],[[5,5,1]]))
note that if you run:
from sklearn.metrics.pairwise import pairwise_distances
print(pairwise_distances([[4,3,5]],[[5,5,1]],metric='cosine'))
You get 0.208 instead of 0.792, this is because pairwise_distance using the
cosine metric is given as 1-cos(theta)
(see that 0.208 + 0.792
is 1). You do this transformation because when you talk about distances, you want the distance from a point to itself to be 0.