Search code examples
algorithmbinarysequencecatalan

Algorithm for finding kth binary number with certain properties


Let's assume we will consider binary numbers which has length 2n and n might be about 1000. We are looking for kth number (k is limited by 10^9) which has following properties:

  • Amount of 1's is equal to amount of 0's what can be described as following: #(1) = #(0)
  • Every prefix of this number has to contain atleast as much 0's as 1's. It might be easier to understand it after negating the sentence, which is: There is no prefix which would contain more 1's than 0's.

And basically that's it. So to make it clear let's do some example: n=2, k=2 we have to take binary number of length 2n:

0000
0001
0010
0011
0100
0101
0110
0111
1000
and so on...

And now we have to find 2nd number which fulfill those two requirements. So we see 0011 is the first one, and 0101 is second one. If we change k=3, then answer doesn't exist since there are number which have same amount of opposite bits, but for 0110, there is prefix 011 so number doesn't fulfill second constraint and same would be with all numbers which has 1 as most significant bit.

So what I did so far to find algorithm?

Well my first idea was to generate all possible bits settings, and check whether it has those two properties, but generate them all would take O(2^(2n)) which is not an option for n=1000.

Additionally I realize there is no need to check all numbers which are smaller than 0011 for n=2, 000111 for n=3, and so on... frankly speaking those which half of most significant bits remains "untouched" because those numbers have no possibility to fulfill #(1) = #(0) condition. Using that I can reduce n by half, but it doesn't help much. Instead of 2 * forever I have forever running algorithm. It's still O(2^n) complexity, which is way too big.

Any idea for algorithm?

Conclusion

This text has been created as a result of my thoughts after reading Andy Jones post.

First of all I wouldn't post code I have used since it's point 6 in following document from Andy's post Kasa 2009. All you have to do is consider nr as that what I described as k. Unranking Dyck words algorithm, would help us find out answer much faster. However it has one bottleneck.

while (k >= C(n-i,j))

Considering that n <= 1000, Catalan number can be quite huge, even C(999,999). We can use some big number arithmetic, but on the other hand I came up with little trick to overpass it and use standard integer.

We don't want to know how big actually Catalan number is as long as it's bigger than k. So now we will create Catalan numbers caching partial sums in n x n table.

...                                     ...
5 |                              42     ...
4 |                        14    42     ...
3 |                   5    14    28     ...
2 |             2     5     9    14     ...
1 |       1     2     3     4     5     ...
0 | 1     1     1     1     1     1     ...
   ----------------------------------   ...
    0     1     2     3     4     5     ...

To generate it is quite trivial:

C(x,0) = 1
C(x,y) = C(x,y-1) + C(x-1,y)  where y > 0 && y < x
C(x,y) = C(x,y-1) where x == y

So what we can see only this:

C(x,y) = C(x,y-1) + C(x-1,y)  where y > 0 && y < x

can cause overflow.

Let's stop at this point and provide definition.

k-flow - it's not real overflow of integer but rather information that value of C(x,y) is bigger than k.

My idea is to check after each running of above formula whether C(x,y) is grater than k or any of sum components is -1. If it is we put -1 instead, which would act as a marker, that k-flow has happened. I guess it quite obvious that if k-flow number is sum up with any positive number it's still be k-flowed in particular sum of 2 k-flowed numbers is k-flowed.

The last what we have to prove is that there is no possibility to create real overflow. Real overflow might only happen if we sum up a + b which non of them is k-flowed but as sum they generated the real overflow.

Of course it's impossible since maximum value can be described as a + b <= 2 * k <= 2*10^9 <= 2,147,483,647 where last value in this inequality is value of int with sign. I assume also that int has 32 bits, as in my case.


Solution

  • The numbers you are describing correspond to Dyck words. Pt 2 of Kasa 2009 gives a simple algorithm for enumerating them in lexicographic order. Its references should be helpful if you want to do any further reading.

    As an aside (and be warned I'm half asleep as I write this, so it might be wrong), the wikipedia article notes that the number of Dyck words of length 2n is the n th Catalan number, C(n). You might want to find the smallest n such that C(n) is larger than the k you're looking for, and then enumerate Dyck words starting from X^n Y^n.