I'm trying to wrap my head around this odd problem that I'm facing. Let's say that we have a following instance of the defaultdict
object:
import dill
from collections import defaultdict
a = 1
b = 2
default_value = a/b
dict_with_default = defaultdict(lambda: default_value)
dict_with_default['a'] = 1.2
dict_with_default['b'] = 3.5
dict_with_default['c'] = 0.25
Let's look how the dictionary looks like:
defaultdict(<function <lambda> at 0x7f860c97be18>, {'a': 1.2, 'b': 3.5, 'c': 0.25})
Before saving to a dill
object I want to check if the dictionary works correctly - it does and 'd'
is added:
print("d: ", dict_with_default['d'])
The dictionary looks like this now:
defaultdict(<function <lambda> at 0x7f860c97be18>, {'a': 1.2, 'b': 3.5, 'c': 0.25, 'd': 0.5})
Since I must save a dictionary to a file (in order to transfer it to a different script) I save it to a dill
object:
with open('./pickles/simple_dict_dill.p', 'wb') as file:
dill.dump(dict_with_default, file, protocol=dill.HIGHEST_PROTOCOL)
Let's now turn our attention to the different script I've mentioned:
import dill
with open('./pickles/simple_dict_dill.p', 'rb') as file:
simple_dict_dill = dill.load(file)
print("a:", simple_dict_dill['a'])
print("d:", simple_dict_dill['d'])
print("e:", simple_dict_dill['e']) # gives error
The line print("e:", simple_dict_dill['e'])
gives the following error even though absent key access is handled by the lambda: default_value
:
NameError: name 'default_value' is not defined
I thought dill
can serialize lambda functions but it turns out there's a problem with it.
I'm the dill
author. It's due to how a lambda
is serialized when defined outside the global dictionary. You can avoid this error in two ways: (1) use the recurse
setting, or (2) define the lambda in the global dict with no dangling pointers.
For example:
>>> import dill
>>> dill.settings['recurse'] = True
>>> from collections import defaultdict
>>> a = 1
>>> b = 2
>>> f = lambda: a/b
>>> d = defaultdict(f)
>>> d['a'] = 3
>>> d['b'] = 4
>>> dill.dumps(d)
b'\x80\x03ccollections\ndefaultdict\nq\x00cdill._dill\n_create_function\nq\x01(cdill._dill\n_load_type\nq\x02X\x08\x00\x00\x00CodeTypeq\x03\x85q\x04Rq\x05(K\x00K\x00K\x00K\x02KCC\x08t\x00t\x01\x1b\x00S\x00q\x06N\x85q\x07X\x01\x00\x00\x00aq\x08X\x01\x00\x00\x00bq\t\x86q\n)X\x07\x00\x00\x00<stdin>q\x0bX\x08\x00\x00\x00<lambda>q\x0cK\x01C\x00q\r))tq\x0eRq\x0f}q\x10(h\x08K\x01h\tK\x02uh\x0cNN}q\x11tq\x12Rq\x13\x85q\x14Rq\x15(h\x08K\x03h\tK\x04u.'
Then I cut-N-paste the string that's produced (instead of writing to a file... it's the same thing) into a new python session.
$ python
Python 3.6.9 (default, Jul 6 2019, 02:58:03)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> d = dill.loads(b'\x80\x03ccollections\ndefaultdict\nq\x00cdill._dill\n_create_function\nq\x01(cdill._dill\n_load_type\nq\x02X\x08\x00\x00\x00CodeTypeq\x03\x85q\x04Rq\x05(K\x00K\x00K\x00K\x02KCC\x08t\x00t\x01\x1b\x00S\x00q\x06N\x85q\x07X\x01\x00\x00\x00aq\x08X\x01\x00\x00\x00bq\t\x86q\n)X\x07\x00\x00\x00<stdin>q\x0bX\x08\x00\x00\x00<lambda>q\x0cK\x01C\x00q\r))tq\x0eRq\x0f}q\x10(h\x08K\x01h\tK\x02uh\x0cNN}q\x11tq\x12Rq\x13\x85q\x14Rq\x15(h\x08K\x03h\tK\x04u.')
>>> d
defaultdict(<function <lambda> at 0x101d8d048>, {'a': 3, 'b': 4})
>>> d['a']
3
>>> d['c']
0.5
>>>
IF you don't use the recurse
setting, then you'd have to use:
>>> f = lambda: 1/2
>>> d = defaultdict(f)
when you create the defaultdict.
Of course, dill
should be able to deal with the pointer references from a lambda better, but there are cases where it can't yet.