Search code examples
optimizationhamming-distance

Sum of Hamming Distances


I started preparing for an interview and came across this problem:

  • An array of integers is given
  • Now calculate the sum of Hamming distances of all pairs of integers in the array in their binary representation.

Example:

given {1,2,3} or {001,010,011} (used 3 bits just to simplify)
result= HD(001,010)+HD(001,011)+HD(010,011)= 2+1+1=4;

The only optimization, from a purely brute force solution, I know I can use here, is in the individual calculation of Hamming Distance as seen here:

int hamming_distance(unsigned x, unsigned y)
{
    int       dist;
    unsigned  val;

    dist = 0;
    val = x ^ y;    // XOR

    // Count the number of bits set
    while (val != 0)
    {
        // A bit is set, so increment the count and clear the bit
        dist++;
        val &= val - 1;
    }

    // Return the number of differing bits
    return dist;
}

What's the best way to go about solving this problem?


Solution

  • You can consider the bit-positions separately. That gives you 32 (or some other number) of easier problems, where you still have to calculate the sum of all pairs of hamming distances, except now it's over 1-bit numbers.

    The hamming distance between two 1-bit numbers is their XOR.

    And now it has become the easiest case of this problem - it's already split per bit.

    So to reiterate the answer to that question, you take a bit position, count the number of 0's and the number of 1's, multiply those to get the contribution of this bit position. Sum those for all bit positions. It's even simpler than the linked problem, because the weight of the contribution of every bit is 1 in this problem.