Search code examples
pythonpython-3.xpickledill

How to pickle a class instance with persistent methods in Python?


I want to serialize a class instance in python and keep methods persistent. I have tried with joblib and pickle and am really close with dill, but can't quite get it.

Here is the problem. Say I want to pickle a class instance like so:

import dill

class Test():
    def __init__(self, value=10):
        self.value = value
    def foo(self):
        print(f"Bar! Value is: {self.value}")

t = Test(value=20)
with open('Test.pkl', 'wb+') as fp:
    dill.dump(t, fp)

# Test it
print('Original: ')
t.foo()        # Prints "Bar! Value is: 20"

Later, the definition of Test changes and when I reload my pickled object the method is different:

class Test():
    def __init__(self, value=10):
        self.value = value
    def foo(self):
        print("...not bar?")

with open('Test.pkl', 'rb') as fp:
    t2 = dill.load(fp)

# Test it
print('Reloaded: ')
t2.foo()        # Prints "...not bar?"

Now in the reloaded case, the attribute value is preserved (t2.value is 20). I can get really close to what I want by serializing the class with dill and not the instance, like so:

class Test():
    def __init__(self, value=10):
        self.value = value
    def foo(self):
        print(f"Bar! Value is: {self.value}")

t = Test(value=20)
with open('Test.pkl', 'wb+') as fp:
    dill.dump(Test, fp)

# Test it
print('Original: ')
t.foo()        # Prints "Bar! Value is: 20"

But then when I rebuild it, I get the old method (what I want) but I lose the attributes of the instance t (in this case I get the default value of 10 instead of the instance value of 20):

class Test():
    def __init__(self, value=10):
        self.value = value
    def foo(self):
        print("...not bar?")

with open('Test.pkl', 'rb') as fp:
    test_class = dill.load(fp)
    t2 = test_class()

# Test it
print('Reloaded: ')
t2.foo()        # Prints "Bar! Value is: 10"

In my actual use case, I have a lot of attributes in the class instance. I want to be able to pickle the attributes as well as the methods so that later source code changes don't make that particular object un-recoverable.

Currently to recover these objects I am copying source code files but the imports get very messy--a lot of sys.path manipulations that get confusing to make sure I load the correct old source code. I could also do something where I pickle the class definition with dill and then save all the attributes to json or something and rebuild that way, but I'm wondering if there is an easy way to do this with dill or some other package that I have not yet discovered. Seems like a straightforward use case to me.


Solution

  • I'm the dill author. dill serializes the class definition with the instance, so you don't need to do it yourself. However, the default behavior is that if the class is updated, then use the updated definition. If you want to load, and ignore (and thus use the stored definition), then use the ignore keyword.

    >>> import dill
    >>> 
    >>> class Test():
    ...     def __init__(self, value=10):
    ...         self.value = value
    ...     def foo(self):
    ...         print(f"Bar! Value is: {self.value}")
    ... 
    >>> t = Test(value=20)
    >>> s = dill.dumps(t)
    >>> t.foo()
    Bar! Value is: 20
    >>> 
    >>> class Test():
    ...     def __init__(self, value=10):
    ...         self.value = value
    ...     def foo(self):
    ...         print("...not bar?")
    ... 
    >>> t2 = dill.loads(s, ignore=True)
    >>> t2.foo()
    Bar! Value is: 20
    

    Is this what you are looking for?

    dill includes dill.settings (also accessible through dump and load) that enable changes to how objects are stored and loaded. recurse=True gives behavior similar to cloudpickle, while byref=True gives behavior similar to pickle.