Search code examples
data-miningsimilarityhamming-distance

What's the relationship between Hamming distance and Simple Matching Coefficient?


I'm doing exercises of Introduction to Data Mining, and got stuck in following question:

Which approach, Jaccard or Hamming distance, is more similar to the Simple Matching Coefficient, and which approach is more similar to the cosine measure? Explain. (Note: The Hamming measure is a distance, while the other three measures are similarities, but don’t let this confuse you.)

I think that the Hamming distance is similar to the SMC, since both of them look at whole dataset and compare data points similar or dissimilar. But the solution of this book just like following:

The Hamming distance is similar to the SMC. In fact, SMC = Hamming distance / number of bits.

Did solution make mistake? I think Hamming distance and SMC isn't equal to each other, and Hamming distance plus SMC equal to 1.


Solution

  • Hamming / length = 1 - SMC

    is a very strong relationship. Because of this they are equivalent in their capabilities.

    You argumet of "looking at the whole data set" is wrong, each is defined on a pair of objects?

    The point of this exercise is to practise your basic math skills, and transform equations into one another. This is a skill you will need frequently:

    1. you don't need to explore equivalent functions, one is enough
    2. of equivalent functions, one may be more efficient to compute than another
    3. of equivalent functions, one may be more precise than another due to floating point math.