Is this expected behaviour (and if so, can someone explain why)? This only happens when using dill, not pickle.
from pathlib import Path
import dill
class MyClass:
def __init__(self) -> None:
pass
path = Path('test/test.pkl')
# create parent directory if it does not exist
path.parent.mkdir(exist_ok=True)
x = [ MyClass() ]
dill.dump(x, path.open('wb'))
y = dill.load(path.open('rb'))
print(isinstance(x[0], MyClass)) # True
print(isinstance(y[0], MyClass)) # False ???
I was expecting True
.
The reason or this is that dill
is pickling and re-creating the MyClass
class object when deserializing your object. Hence MyClass
(also x[0].__class__
) is a different object compared to the deserialized y[0].__class__
object, which causes the isinstance
check to fail against MyClass
.
print(id(MyClass))
# 140430969773264
print(id(x[0].__class__)) # same as above
# 140430969773264
print(id(y[0].__class__)) # different
# 140430969780544
By contrast, the stdlib pickle
module will use a reference to the class instead, which results in the behavior you expect because it will import the class by reference rather than creating a new class when deserializing your object.
To make dill
use references, set the byref
setting to True
with
byref=True
,dill
to behave a lot more like pickle with certain objects (like modules) pickled by reference as opposed to attempting to pickle the object itself.
dill.settings['byref'] = True
x = [ MyClass() ]
dill.dump(x, path.open('wb'))
y = dill.load(path.open('rb'))
print(isinstance(x[0], MyClass)) # True
print(isinstance(y[0], MyClass)) # True
Alternatively, you can just use pickle
from the stdlib instead of dill
:
pickle.dump(x, path.open('wb'))
y = pickle.load(path.open('rb'))
print(isinstance(x[0], MyClass)) # True
print(isinstance(y[0], MyClass)) # True