python performance serialization deserialization

Python: performance comparison of using `pickle` or `marshal` and using `re`

I am calculating some very large numbers using Python, and I'd like to store previously calculated results in Berkeley DB.

The problem is that Berkeley DB has to use strings, and I have to store an integer tuple for the calculation results.

For example, I get (m, n) as my result, one way is to store this as "%d,%d" % (m, n) and read it out using re. I can also store the tuple using pickle or marshal.

Which has the better performance?

Solution

For pure speed, marshal will get you the fastest results.

Timings:

>>> timeit.timeit("pickle.dumps([1,2,3])","import pickle",number=10000)
0.2939901351928711
>>> timeit.timeit("json.dumps([1,2,3])","import json",number=10000)
0.09756112098693848
>>> timeit.timeit("pickle.dumps([1,2,3])","import cPickle as pickle",number=10000)
0.031056880950927734
>>> timeit.timeit("marshal.dumps([1,2,3])","import marshal", number=10000)
0.00703883171081543