Search code examples
pythonpicklememoization

Is the pickling process deterministic?


Does Pickle always produce the same output for a certain input value? I suppose there could be a gotcha when pickling dictionaries that have the same contents but different insert/delete histories. My goal is to create a "signature" of function arguments, using Pickle and SHA1, for a memoize implementation.


Solution

  • I suppose there could be a gotcha when pickling dictionaries that have the same contents but different insert/delete histories.

    Right:

    >>> pickle.dumps({1: 0, 9: 0}) == pickle.dumps({9: 0, 1: 0})
    False
    

    See also: pickle.dumps not suitable for hashing

    My goal is to create a "signature" of function arguments, using Pickle and SHA1, for a memoize implementation.

    There's a number of fundamental problems with this. It's impossible to come up with an object-to-string transformation that maps equality correctly—think of the problem of object identity:

    >>> a = object()
    >>> b = object()
    >>> a == b
    False
    >>> pickle.dumps(b) == pickle.dumps(a)
    True
    

    Depending on your exact requirements, you may be able to transform object hierarchies into ones that you could then hash:

    def hashablize(obj):
        """Convert a container hierarchy into one that can be hashed.
        
        Don't use this with recursive structures!
        Also, this won't be useful if you pass dictionaries with
        keys that don't have a total order.
        Actually, maybe you're best off not using this function at all."""
        try:
            hash(obj)
        except TypeError:
            if isinstance(obj, dict):
                return tuple((k, hashablize(v)) for (k, v) in sorted(obj.iteritems()))
            elif hasattr(obj, '__iter__'):
                return tuple(hashablize(o) for o in obj)
            else:
                raise TypeError("Can't hashablize object of type %r" % type(obj))
        else:
            return obj