Search code examples
pythonpandasmemory-managementpython-itertools

Memory errors with itertools and pandas?


I am trying to generate following stepped sequence pattern but python throws MemoryError

import numpy as np
import pandas as pd
import itertools

Temp = np.linspace(-5,5,pow(2,16))

df = pd.DataFrame([Temp*2] , index=['ColA','ColB']).T

print df

df2 = pd.DataFrame([e for e in itertools.product(df.ColA,df.ColB)],columns=df.columns)

print df2

Errors

df2 = pd.DataFrame([e for e in itertools.product(df.ColA,df.ColB)],columns=df.columns)
MemoryError

Please let me know how I can fix this?


Solution

  • With power=16 and itertools.product (yielding the cartesian product), you are creating a list of (2*2)^16=4,294,967,296 tuples, or rows in your DataFrame. Do you want that long a sequence?

    power = 16
    for i in range(power):
        Temp = np.linspace(-5, 5, pow(2, i))
        df = pd.DataFrame([Temp] * 2, index=['ColA','ColB']).T
        print(i, len(df), len(list(product(df.ColA, df.ColB))))
    
    0 1 1
    1 2 4
    2 4 16
    3 8 64
    4 16 256
    5 32 1024
    6 64 4096
    7 128 16384
    8 256 65536
    9 512 262144
    10 1024 1048576
    11 2048 4194304
    12 4096 16777216
    13 8192 67108864
    14 16384 268435456
    ...