Search code examples
pythonpandasmemorycombinationspython-itertools

How can I solve to get a large combination of n binary values in Python if memory error occurs?


I am trying to run get all combination of n binary values 0 and 1. Here are the codes that I typed.

import itertools
from itertools import product
import pandas as pd
combinations=pd.DataFrame(product(range(2),repeat=k))

This works when the value of k is small. However, I need to get all combination of at least 30 binary values 0 and 1. For example, I tried k=31 and it resulted a memory error as shown below.

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-5-97fdebdd2a99> in <module>
----> 1 pd.DataFrame(product(range(2),repeat=k))

~\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    467         elif isinstance(data, abc.Iterable) and not isinstance(data, (str, bytes)):
    468             if not isinstance(data, (abc.Sequence, ExtensionArray)):
--> 469                 data = list(data)
    470             if len(data) > 0:
    471                 if is_list_like(data[0]) and getattr(data[0], "ndim", 1) == 1:

MemoryError: 

I have tried to run this piece of codes in a computer with 128GB RAM and a python x64 version yet I was not successful to get the desired results but only memory error.

For example, is it possible to create two or more dataframe that together form my desired dataframe? Nonetheless, I had no ideas on how to work this out to do separate computation getting the combination and combine them at the end.

Or, is there any other way to successfully get the large combination in python?

I could really use your help.


Solution

  • You can generate all binary strings of N bits (which seems to be what you want here) with a generator like

    def generate_binary_strings(n):
        format_string = f"{{:0{n}b}}"
        for x in range(1 << n):
            yield format_string.format(x)
    
    
    for x in generate_binary_strings(4):
        print(x)
    

    This outputs

    0000
    0001
    0010
    0011
    0100
    0101
    0110
    0111
    1000
    1001
    1010
    1011
    1100
    1101
    1110
    1111
    

    I'd still advise against putting them in a list, though - it's going to be long with bits=30 :)