Search code examples
stringalgorithmlanguage-agnosticiteration

How can I iterate over balanced binary strings?


I need to systematically iterate over the set of balanced binary strings of (even) length n, i.e. the binary strings composed of an equal number of 1s and 0s. (This problem can be stated in many equivalent ways; e.g. given the set S = {1, 2, ..., n}, how to iterate over the balanced bipartitions of S, or the subsets of S of size n/2.) What is the most efficient way to do so?

The most naive algorithm is to simply iterate over all binary strings (e.g. in the standard lexicographic order) and then skip all the unbalanced ones. But this is very inefficient: there are 2^n binary strings of length n, but only (n choose n/2) are balanced. By Stirling's approximation, this quantity grows asymptotically as $\sqrt{2/(\pi n)} 2^n$, so only a small fraction ~1/n of the binary strings are balanced.

It seems to me that a more efficient implementation might go something like this: you start with the string 000...0111...1, and then hop the 1's to the left by a sequence of adjacent transpositions 01 -> 10. But I can't figure out a way to organize the transpositions so that every balanced binary string is produced once. For example, if you always transpose the leftmost 01, then you'll miss lots of strings (e.g. 0110), but if at every step you branch every possible transposition into a tree, then you'll overcount many strings (e.g. 0101 -> 1010 can be reached by performing the two transitions in either order).

(It would be nice, but not necessary, if the algorithm easily generalized to be able to iterate over the set of n-bit binary strings with an arbitrary fixed imbalance.)


Solution

  • Here's a Python recursion (could easily be converted to a stack iteration; arbitrary fixed imbalance available in the inner g function):

    def f(n):
      def g(n, k, s):
        if len(s) == n:
          return [s]
    
        if len(s) + k == n:
          return [s + (n - len(s)) * '1']
    
        if k == 0:
          return [s + (n - len(s)) * '0']
    
        return g(n, k, s + '0') + g(n, k - 1, s + '1')
    
      return g(n, n / 2, '')
    

    Output:

    => f(6)
    => ['000111', '001011', '001101', '001110', '010011', '010101', '010110', '011001', '011010', '011100', '100011', '100101', '100110', '101001', '101010', '101100', '110001', '110010', '110100', '111000']