Search code examples
pythonserializationdeserializationpickledill

isinstance() fails on an object contained in a list, after using dill.dump and dill.load


Is this expected behaviour (and if so, can someone explain why)? This only happens when using dill, not pickle.

from pathlib import Path
import dill

class MyClass:

    def __init__(self) -> None:
        pass


path = Path('test/test.pkl')
# create parent directory if it does not exist
path.parent.mkdir(exist_ok=True)

x = [ MyClass() ]
dill.dump(x, path.open('wb'))
y = dill.load(path.open('rb'))
    
print(isinstance(x[0], MyClass)) # True
print(isinstance(y[0], MyClass)) # False ???

I was expecting True.


Solution

  • The reason or this is that dill is pickling and re-creating the MyClass class object when deserializing your object. Hence MyClass (also x[0].__class__) is a different object compared to the deserialized y[0].__class__ object, which causes the isinstance check to fail against MyClass.

    print(id(MyClass))
    # 140430969773264
    
    print(id(x[0].__class__)) # same as above
    # 140430969773264
    
    print(id(y[0].__class__)) # different
    # 140430969780544
    

    By contrast, the stdlib pickle module will use a reference to the class instead, which results in the behavior you expect because it will import the class by reference rather than creating a new class when deserializing your object.

    To make dill use references, set the byref setting to True

    with byref=True, dill to behave a lot more like pickle with certain objects (like modules) pickled by reference as opposed to attempting to pickle the object itself.

    dill.settings['byref'] = True
    x = [ MyClass() ]
    dill.dump(x, path.open('wb'))
    y = dill.load(path.open('rb'))
    print(isinstance(x[0], MyClass)) # True
    print(isinstance(y[0], MyClass)) # True
    

    Alternatively, you can just use pickle from the stdlib instead of dill:

    pickle.dump(x, path.open('wb'))
    y = pickle.load(path.open('rb'))
    print(isinstance(x[0], MyClass)) # True
    print(isinstance(y[0], MyClass)) # True