Search code examples
pythonperformanceserializationdeserialization

Python: performance comparison of using `pickle` or `marshal` and using `re`


I am calculating some very large numbers using Python, and I'd like to store previously calculated results in Berkeley DB.

The problem is that Berkeley DB has to use strings, and I have to store an integer tuple for the calculation results.

For example, I get (m, n) as my result, one way is to store this as "%d,%d" % (m, n) and read it out using re. I can also store the tuple using pickle or marshal.

Which has the better performance?


Solution

  • For pure speed, marshal will get you the fastest results.

    Timings:

    >>> timeit.timeit("pickle.dumps([1,2,3])","import pickle",number=10000)
    0.2939901351928711
    >>> timeit.timeit("json.dumps([1,2,3])","import json",number=10000)
    0.09756112098693848
    >>> timeit.timeit("pickle.dumps([1,2,3])","import cPickle as pickle",number=10000)
    0.031056880950927734
    >>> timeit.timeit("marshal.dumps([1,2,3])","import marshal", number=10000)
    0.00703883171081543