Search code examples
pythonmultithreadingnumpymultiprocessingpython-itertools

Find all binary strings of certain weight has fast as possible


I want to find binary strings of a certain weight. The amount of such strings grows to the point of a memory error, so I'm currently generating them with a generator. This code generators all length n binary strings with weight k:

def kbits(n, k):
    for bits in itertools.combinations(range(n), k):
        s = ['0'] * n
        for bit in bits:
            s[bit] = '1'
        yield ''.join(s)

for b in kbits(length, weight):
    print(b)

So for length = 3 and weight = 2, we get 110, 101, 011.

My research requires me to parse through values such as n = 56 and k = 7, which takes around 24 hours on my device. I'd also like to try n = 72 and k = 8, which (based on the time of the previous result) may take 365 days. So I'm wondering two things:

  1. Is this the quickest (non-memory) intensive way of generating these binary strings?

  2. Is it possible to have multiple cores of my CPU working on this at once? I'm assuming itertools is parsing through a sequence. If (let's say) we had a 2-core CPU, would it be possible to have the first core parse the first 50% of the sequence and the second core to do the latter half?

EDIT:

Perhaps I should mention that for each boolean b, I'd like to perform the following least-squares computation, where N is some defined matrix:

for b in kbits(size, max_coclique):
    v = np.linalg.lstsq(N,np.array(list(b), dtype = float))

i.e. I require the ultimate expected output format for b to be a numpy array with 0/1 values. (That is unless there is some extremely fast way of doing all of this - including the least-squares computation - in a different way.)

Note: I'm also running this in Sage, as I am utilizing its database of transitive groups.


Solution

  • Given a value with weight k, you can get the lexically next value as follows:

    1. Find the rightmost 0 to the left of the rightmost 1.
    2. move a 1 from the right into that 0
    3. move all the other 1s to the right of that zero as far right as possible.

    This is the binary version of the Pandita algorithm: https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order

    You can do it with bit manipulations like this:

    def kbits(n, k):
        limit=1<<n
        val=(1<<k)-1
        while val<limit:
            yield "{0:0{1}b}".format(val,n)
            minbit=val&-val #rightmost 1 bit
            fillbit = (val+minbit)&~val  #rightmost 0 to the left of that bit
            val = val+minbit | (fillbit//(minbit<<1))-1
    

    There are probably some opportunities for optimization remaining, but the time will be dominated by formatting the values as binary strings in the yield statement.