Search code examples
pythonpython-2.7numpypicklebz2

Faster repetitive uses of bz2.BZ2File for pickling


I'm pickling multiple objects repeatedly, but not consecutively. But as it turned out, pickled output files were too large (about 256MB each).

So I tried bz2.BZ2File instead of open, and each file became 1.3MB. (Yeah, wow.) The problem is that it takes too long (like 95 secs pickling one object) and I want to speed it up.

Each object is a dictionary, and most of them have similar structures (or hierarchies, if that describes it better: almost the same set of keys, and each value that corresponds to each key normally has some specific structure, and so on). Many of the dictionary values are numpy arrays, and I think many zeros will appear there.

Can you give me some advice to make it faster?

Thank you!


Solution

  • I ended up using lz4, which is a blazingly fast compression algorithm.

    There is a python wrapper, which can be installed easily:

    pip install lz4