Search code examples
pythonjsonpicklesimplejson

What is faster - Loading a pickled dictionary object or Loading a JSON file - to a dictionary?


What is faster:

(A) 'Unpickling' (Loading) a pickled dictionary object, using pickle.load()

or

(B) Loading a JSON file to a dictionary using simplejson.load()

Assuming: The pickled object file exists already in case A, and that the JSON file exists already in case B.


Solution

  • The speed actually depends on the data, it's content and size.

    But, anyway, let's take an example json data and see what is faster (Ubuntu 12.04, python 2.7.3) :

    Giving this json structure dumped into test.json and test.pickle files:

    {
        "glossary": {
            "title": "example glossary",
            "GlossDiv": {
                "title": "S",
                "GlossList": {
                    "GlossEntry": {
                        "ID": "SGML",
                        "SortAs": "SGML",
                        "GlossTerm": "Standard Generalized Markup Language",
                        "Acronym": "SGML",
                        "Abbrev": "ISO 8879:1986",
                        "GlossDef": {
                            "para": "A meta-markup language, used to create markup languages such as DocBook.",
                            "GlossSeeAlso": ["GML", "XML"]
                        },
                        "GlossSee": "markup"
                    }
                }
            }
        }
    }
    

    Testing script:

    import timeit
    
    import pickle
    import cPickle
    
    import json
    import simplejson
    import ujson
    import yajl
    
    
    def load_pickle(f):
        return pickle.load(f)
    
    
    def load_cpickle(f):
        return cPickle.load(f)
    
    
    def load_json(f):
        return json.load(f)
    
    
    def load_simplejson(f):
        return simplejson.load(f)
    
    
    def load_ujson(f):
        return ujson.load(f)
    
    
    def load_yajl(f):
        return yajl.load(f)
    
    
    print "pickle:"
    print timeit.Timer('load_pickle(open("test.pickle"))', 'from __main__ import load_pickle').timeit()
    
    print "cpickle:"
    print timeit.Timer('load_cpickle(open("test.pickle"))', 'from __main__ import load_cpickle').timeit()
    
    print "json:"
    print timeit.Timer('load_json(open("test.json"))', 'from __main__ import load_json').timeit()
    
    print "simplejson:"
    print timeit.Timer('load_simplejson(open("test.json"))', 'from __main__ import load_simplejson').timeit()
    
    print "ujson:"
    print timeit.Timer('load_ujson(open("test.json"))', 'from __main__ import load_ujson').timeit()
    
    print "yajl:"
    print timeit.Timer('load_yajl(open("test.json"))', 'from __main__ import load_yajl').timeit()
    

    Output:

    pickle:
    107.936687946
    
    cpickle:
    28.4231381416
    
    json:
    31.6450419426
    
    simplejson:
    20.5853149891
    
    ujson:
    16.9352178574
    
    yajl:
    18.9763481617
    

    As you can see, unpickling via pickle is not that fast at all - cPickle is definetely the way to go if you choose pickling/unpickling option. ujson looks promising among these json parsers on this particular data.

    Also, json and simplejson libraries load much faster on pypy (see Python JSON Performance).

    See also:

    It's important to note that the results may differ on your particular system, on other type and size of data.