Search code examples
machine-learningdistanceeuclidean-distancecosine-similarityhamming-distance

There are other useful similarity or distance metrics?


I'm developing an approximate computation system. Defining how much similar two objects are is a basic operation in such a system.

Usually in computer science and math, similarity is synonym of distance between two objects, but it is not always clear to me in which kind of application the following distances are used:

  1. Jaccard coefficient is used in information retrieval for ranking and scoring.
  2. Cosine similarity is used for real vectors and it's used for example to measure similarity between documents (even if it doesn't consider term position, but only frequency)
  3. Hamming distance is used for binary vectors and it's used for example to measure similarity between binary descriptor (such as ORB) in computer vision and image processing
  4. Euclidean distance is used for real vectors and it's used to measure the distance between two points (and is often reffered as L^2 distance)
  5. Kernel functions: in machine learning some kernel functions (such as RBF kernel) are used to similarity measure exploiting the kernel trick.

I know that each one of these metrics is defined in a different way, but I wonder if there is a survey or a paper that list possible applications in computer science for each one of them (or others that I didn't report). Can you help me with this?


Solution

  • A quick search yielded, "Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions", which appears excellent.