Search code examples
pythonpython-3.xmemoryout-of-memory

MemoryError in 64-bit Python?


I'm running 64-bit Python 3 on Linux, and I have a code that generates lists with about 20,000 elements. A memory error occurred when my code tried to write a list of ~20,000 2D arrays to a binary file via the pickle module, but it generated all of these arrays and appended them to this list without a problem. I know this must take up a lot of memory, but the machine I'm using has about 100GB available (from the command free -m). The line with the error:

with open('all_data.data', 'wb') as f:
    pickle.dump(data, f)
>>> MemoryError

where data is my list of ~20,000 numpy arrays. Also, previously I was trying to run this code with about 55,000 elements, but while it was 40% of the way through with appending all the arrays to the data list, it just output Killed by itself. So now I'm trying to break it into segments, but this time I get a MemoryError. How can I bypass this? I was also informed that I have access to multiple CPUs, but I have no idea how to take advantage of these (I don't yet understand multiprocessing).


Solution

  • Pickle will try to parse all your data, and likely convert it to intermediate states before writing everything to disk - so if you are using about half your available memory, it will blow up.

    Since your data is already on a list, an easy workaround there is to pickle each array, and store it, instead of trying to serialize the 20000 arrays in a single go:

    with open('all_data.data', 'wb') as f:
        for item in data:
            pickle.dump(item, f)
    

    Then, to read it back, just keep unpickling objects from your file and appending then to a list, until the file is exhausted:

    data = []
    with open('all_data.data', 'rb') as f:
        while True:
            try:
                data.append(pickle.load(f))
            except EOFError:
                break
    

    This works because unpicking from a file is quite well behaved: the file pointer stays exactly at the point a pickled object stored in the file ends - further reads therefore start at the beginning of the next object.