If I pickle a function with dill
that contains a global, somehow that global state isn't respected when the function is loaded again. I don't understand enough about dill
to be anymore specific, but take this working code for example:
import multiprocessing
import dill
def initializer():
global foo
foo = 1
def worker(arg):
return foo
with multiprocessing.Pool(2, initializer) as pool:
res = pool.map(worker, range(10))
print(res)
This works fine, and prints [1, 1]
as expected. However, if I instead pickle the initializer
and worker
functions using dill
's recurse=True
, and then restore them, it fails:
import multiprocessing
import dill
def initializer():
global foo
foo = 1
def worker(arg):
return foo
with open('funcs.pkl', 'wb') as f:
dill.dump((initializer, worker), f, recurse=True)
with open('funcs.pkl', 'rb') as f:
initializer, worker = dill.load(f)
with multiprocessing.Pool(2, initializer) as pool:
res = pool.map(worker, range(2))
This code fails with the following error:
File "/tmp/ipykernel_158597/1183951641.py", line 9, in worker
return foo
^^^
NameError: name 'foo' is not defined
If I use recurse=False
it works fine, but somehow pickling them in this way causes the code to break. Why?
With the recurse=True
option, dill.dump
builds a new globals dict for the function being serialized with objects that the function refers to also recursively serialized. The side effect is that when deserialized with dill.load
, these objects are reconstructed as new objects, including the globals dict for the function.
This is why, after deserialization, the globals dicts of the functions become different objects from each other, so that changes made to the globals dict of the initializer
function have no effect on the globals dict of the worker
function.
You can verify this behavior by checking the identity of the global namespace in which a function object is defined and runs under, availble as the __globals__
attribute of the function object:
import dill
def initializer():
global foo
foo = 1
def worker(arg):
return foo
print(id(globals()))
print(id(initializer.__globals__))
print(id(worker.__globals__))
with open('funcs.pkl', 'wb') as f:
dill.dump((initializer, worker), f, recurse=True)
with open('funcs.pkl', 'rb') as f:
initializer, worker = dill.load(f)
print('-- dilled --')
print(id(globals()))
print(id(initializer.__globals__))
print(id(worker.__globals__))
This outputs something like:
124817730351552
124817730351552
124817730351552
-- dilled --
124817730351552
124817727897280
124817728060352