cluster-analysis distance data-mining similarity cosine-similarity

some questions on cosine similarity

Yesterday I learnt that the cosine similarity, defined as

enter image description here

can effectively measure how similar two vectors are.

I find that the definition here uses the L2-norm to normalize the dot product of A and B, what I am interested in is that why not use the L1-norm of A and B in the denominator?

My teacher told me that if I use the L1-norm in the denominator, then cosine similarity would not be 1 if A=B. Then, I further ask him, if I modify the cosine similarity definition as follows, what the advantages and disadvantages the modified model are, as compared with the original model?

sim(A,B) = (A * B) / (||A||1 * ||B||1) if A!=B

sim(A,B) = 1 if A==B

I would appreciate if someone could give me some more explanations.

Solution

If you used L1-norm, your are not computing the cosine anymore.

Cosine is a geometrical concept, not a random definition. There is a whole string of mathematics attached to it. If you used the L1, you are not measuring angles anymore.

Note that cosine is monotone to Euclidean distance on L2 normalized vectors.

Euclidean(x,y)^2 = sum( (x-y)^2 ) = sum(x^2) + sum(y^2) - 2 sum(x*y)

if x and y are L2 normalized, then sum(x^2)=sum(y^2)=1, and then

Euclidean(x_norm,y_norm)^2 = 2 * (1 - sum(x_norm*y_norm)) = 2 * (1 - cossim(x,y))

So using cosine similarity essentially means standardizing your data to unit length. But there are also computational benefits associated with this, as sum(x*y) is cheaper to compute for sparse data.

If you L2 normalized your data, then

Euclidean(x_norm, y_norm) = sqrt(2) * sqrt(1-cossim(x,y))

For the second part of your question: fixing L1 norm isn't that easy. Consider the vectors (1,1) and (2,2). Obviously, these two vectors have the same angle, and thus should have cosine similarity 1.

Using your equation, they would have similarity (2+2)/(2*4) = 0.5

Looking at the vectors (0,1) and (0,2) - where most people agree they should have a similar similarity than above example (and where cosine indeed gives the same similarity), your equation yields (0+2)/(1+2) = 0.6666.... So your similarity does not match any intuition, does it?