Given two binary strings a and b, find the sum of the Hamming distances between a and all contiguous substrings of b of length |a|.
inputCopy:
01
00111
outputCopy:
3
Explanation: For the first sample case, there are four contiguous substrings of b of length |a|: "00", "01", "11", and "11". The distance between "01" and "00" is |0 - 0| + |1 - 0| = 1. The distance between "01" and "01" is |0 - 0| + |1 - 1| = 0. The distance between "01" and "11" is |0 - 1| + |1 - 1| = 1. Last distance counts twice, as there are two occurrences of string "11". The sum of these edit distances is 1 + 0 + 1 + 1 = 3.
In this question, i'm only thinking of a brute force solution with time complexity O(|a|.|b|) like a string matching algorithm... Is there any faster algo to do this problem
As you are computing the sum of Hamming distances, it can be done very fast:
H := sum of Hamming distances
compute array A and B such as the following:
A[i]: the number of zeros up to i-th element of b
B[i]: the number of ones up to i-th element of b
iterate over elements of a for i <- 0:|a|-1:
as a[i] should be compared with all elemetns of b[i] .. b[|b|-|a|+i]
its effect over the value of summing distances is:
if a[i] == 0:
H += B[|b|-|a|+i] - B[i-1]
else:
H += A[|b|-|a|-1] - A[i-1]
In the above pesudocode B[|b|-|a|+i] - B[i-1]
means number of ones between the i
-th element and |b|-|a|+i
-th element of b
the same for A[|b|-|a|-1] - A[i-1]
). These are elements that the i
-th member of a
should be compared with to compute the sum of Hamming distances. Hence, the times complexity of this algorithm is \Theta(|a| + |b|)
.