machine-learning data-analysis cosine-similarity recommendation-engine

I have two formulas for calculating 'cosine similarity', what's the difference?

I am doing a project about cosine similarity on movie dataset, I'm confused about the formula for calculating cosine similarity.

But I searched online, some articles show that the denominator is something like : sqrt(A1^2+B1^2) * sqrt(A2^2+B2^2) * ... * sqrt(Ai^2+Bi^2)

I'm confused, what's the difference? Which one is correct or they both are correct?

Solution

The one on your image is correct. In two dimensions, it is derived from the Law of cosines which relates the length of one side of a triangle to the length of the other two sides, and the angle opposite c, theta:

c^2==a^2+b^2-2*b*c(cos(theta))

You can prove this in many ways, and a good verification is to know that when cos(gamma)==0 (side a and b are orthogonal), you get the Pythagorean Theorem. To get the formula on the image, you have to translate it into analytical geometry (vectors)

norm(A-B)^2==norm(A)^2+norm(B)^2−2*norm(A)*norm(B)*cos(theta)

and by using that norm(A-B)^2 is by definition (A-B)*(A-B) and expanding we get

norm(A-B)^2 ==norm(A)^2+norm(B)^2-2*A*B

So equating both expressions, and doing cancellations, yields

norm(A)*norm(B)*cos(theta) = A*B

which is the (rearranged) formula on your definition (and the norm(v) = sqrt(v*v)). For n dimensions you can show this works because rotating the euclidean space preserves norm and inner product, and because the 2D plane spanned by the vectors is precisely just a rotation of the xy plane.

A good sanity check is, again that orthogonality yields a cosine of 0, and that the cosine is between 0 and 1 (this is the Cauchy Schwarz theorem)

Update: In the examples mentioned on your comment, you can see the results from the blog by running

import sklearn.metrics.pairwise as pw
print(pw.cosine_similarity([[4,3]],[[5,5]]))
print(pw.cosine_similarity([[4,3,5]],[[5,5,1]]))

note that if you run:

from sklearn.metrics.pairwise import pairwise_distances
print(pairwise_distances([[4,3,5]],[[5,5,1]],metric='cosine'))

You get 0.208 instead of 0.792, this is because pairwise_distance using the cosine metric is given as 1-cos(theta) (see that 0.208 + 0.792 is 1). You do this transformation because when you talk about distances, you want the distance from a point to itself to be 0.