Search code examples
stringalgorithmhamming-distance

Sum of all Hamming distances of a given string A from substrings of length |A| of another string B


Given two binary strings a and b, find the sum of the Hamming distances between a and all contiguous substrings of b of length |a|.

inputCopy:

01

00111

outputCopy:

3

Explanation: For the first sample case, there are four contiguous substrings of b of length |a|: "00", "01", "11", and "11". The distance between "01" and "00" is |0 - 0| + |1 - 0| = 1. The distance between "01" and "01" is |0 - 0| + |1 - 1| = 0. The distance between "01" and "11" is |0 - 1| + |1 - 1| = 1. Last distance counts twice, as there are two occurrences of string "11". The sum of these edit distances is 1 + 0 + 1 + 1 = 3.

In this question, i'm only thinking of a brute force solution with time complexity O(|a|.|b|) like a string matching algorithm... Is there any faster algo to do this problem


Solution

  • As you are computing the sum of Hamming distances, it can be done very fast:

    H := sum of Hamming distances
    compute array A and B such as the following:
        A[i]: the number of zeros up to i-th element of b
        B[i]: the number of ones up to i-th element of b
    
    iterate over elements of a for i <- 0:|a|-1:
        as a[i] should be compared with all elemetns of b[i] .. b[|b|-|a|+i]
        its effect over the value of summing distances is:
            if a[i] == 0:
                H += B[|b|-|a|+i] - B[i-1]
            else: 
                H += A[|b|-|a|-1] - A[i-1]
    

    In the above pesudocode B[|b|-|a|+i] - B[i-1] means number of ones between the i-th element and |b|-|a|+i-th element of b the same for A[|b|-|a|-1] - A[i-1]). These are elements that the i-th member of a should be compared with to compute the sum of Hamming distances. Hence, the times complexity of this algorithm is \Theta(|a| + |b|).