I'm playing with scipy's cosine distance. From what I've gathered, the closer a cosine distance is to 1, the more similar the vectors are. I got some unexpected results in a text mining project, so I decided to investigate the simplest case.
import numpy as np
import scipy.spatial
arr1 = np.array([1,1])
arr2 = np.array([1,1])
print scipy.spatial.distance.cosine(arr1, arr2)
My program prints 0.0.
Shouldn't the result be 1.0? Why or why not?
It is the cosine distance, not the cosine similarity. A basic requirement for a function d(u, v) to be a distance is that d(u, u) = 0.
See the definition of the formula in the docstring of scipy.spatial.distance.cosine
, and notice that the formula begins 1 - (...)
. Your expectation of the function is probably based on the quantity in (...)
, but that expression is the cosine similarity.