algorithm language-agnostic time-complexity complexity-theory

Same number of 0s and 1s algorithm

I'm trying to solve the following problem:

Given an binary array containing only 0s and 1s, find the largest subarray which contain equal no of 0s and 1s.

Examples:

Input: arr[] = {1, 0, 1, 1, 1, 0, 0,0,1} Output: 1 to 8 (Starting and Ending indexes of output sub array)

I could only think of an O(n^2) solution (i.e. the obvious way of starting an array at each subposition and then checking all remaining elements for having the same number of 0s and 1s).

Can somebody figure out a better solution for this problem?

Solution

One teensy tiny note on wording: you say find the longest subarray, which implies uniqueness, but even in your example there is more than one ( 0 to 7 or 1 to 8). It would be worded better as "find a subarray of maximal length" or similar. But that's a non-issue.

As for a faster algorithm, first define a new array swap by replacing each instance of a 0 with a -1. This can be done in O(n) time. For your example, we would have

1, -1, 1, 1, 1, -1, -1, -1, 1

Now define another array sum such that sum[i] is the sum of all values swap[0], swap[1], ..., swap[i]. Equivalently,

sum[0] = swap[0];
for i > 1, sum[i] = sum[i-1] + swap[i]

Which again is in O(n) time. So your example becomes

1, 0, 1, 2, 3, 2, 1, 0, 1

Now for an observation. If the number of 1s is equal to the number of 0s in the subarray (arr[i], ..., arr[j]) then in the first new array the 1s will cancel with the corresponding -1, so the sum of all values (swap[i], ..., swap[j]) will be equal to 0. But this is then equal to

swap[0] + swap[1] + ... + swap[j] - (swap[0] + swap[1] + ... + swap[i-1]),

which in turn is equal to

sum[j] - sum[i-1].

Although note we have to be careful if i is equal to 0 because otherwise we are out of the array's bounds. This is an easy check to implement.

Now we have reduced the problem to finding when sum[j] - sum[i-1] is equal to 0. But this is equivalent to finding values j and i such that sum[j] = sum[i-1].

Since we know that for all values in sum they lie between -n and n (where n is the size of the initial array) you can now create another a pair of arrays min and max of size 2n+1. Here, the indices of min and max correspond to potential values of sum, where min[0] will hold the smallest index i for which sum[i] = -n, and min[1] will hold the smallest index i for which sum[i] = -n+1, and so on. Similarly max will hold the largest index. This can also be achieved in linear time. After this step, max[i] and min[i] will correspond to values for which sum[min[i]] = i = sum[max[i]].

Now all you have to do is find the largest value of max[k] - min[k], and this will give you from above i = min[k] + 1 and j = max[k], the indices of a maximal subarray containing an equal number of 0s and 1s. This is also O(n).

I sorta sketched this out a bit roughly, so you have to be careful when i = 0, but that's easily accounted for. Each step is however O(n), so there's your more efficient algorithm.