Search code examples
pythonlambdapickledill

Python's dill one-time use of default argument in a defaultdict


I'm trying to wrap my head around this odd problem that I'm facing. Let's say that we have a following instance of the defaultdict object:

import dill
from collections import defaultdict

a = 1
b = 2
default_value = a/b

dict_with_default = defaultdict(lambda: default_value)

dict_with_default['a'] = 1.2
dict_with_default['b'] = 3.5
dict_with_default['c'] = 0.25

Let's look how the dictionary looks like:

defaultdict(<function <lambda> at 0x7f860c97be18>, {'a': 1.2, 'b': 3.5, 'c': 0.25})

Before saving to a dill object I want to check if the dictionary works correctly - it does and 'd' is added:

print("d: ", dict_with_default['d'])

The dictionary looks like this now:

defaultdict(<function <lambda> at 0x7f860c97be18>, {'a': 1.2, 'b': 3.5, 'c': 0.25, 'd': 0.5})

Since I must save a dictionary to a file (in order to transfer it to a different script) I save it to a dill object:

with open('./pickles/simple_dict_dill.p', 'wb') as file:
    dill.dump(dict_with_default, file, protocol=dill.HIGHEST_PROTOCOL)

Let's now turn our attention to the different script I've mentioned:

import dill

with open('./pickles/simple_dict_dill.p', 'rb') as file:
    simple_dict_dill = dill.load(file)

print("a:", simple_dict_dill['a'])
print("d:", simple_dict_dill['d'])
print("e:", simple_dict_dill['e']) # gives error

The line print("e:", simple_dict_dill['e']) gives the following error even though absent key access is handled by the lambda: default_value:

NameError: name 'default_value' is not defined

I thought dill can serialize lambda functions but it turns out there's a problem with it.


Solution

  • I'm the dill author. It's due to how a lambda is serialized when defined outside the global dictionary. You can avoid this error in two ways: (1) use the recurse setting, or (2) define the lambda in the global dict with no dangling pointers.

    For example:

    >>> import dill
    >>> dill.settings['recurse'] = True
    >>> from collections import defaultdict
    >>> a = 1
    >>> b = 2
    >>> f = lambda: a/b
    >>> d = defaultdict(f)
    >>> d['a'] = 3
    >>> d['b'] = 4
    >>> dill.dumps(d)
    b'\x80\x03ccollections\ndefaultdict\nq\x00cdill._dill\n_create_function\nq\x01(cdill._dill\n_load_type\nq\x02X\x08\x00\x00\x00CodeTypeq\x03\x85q\x04Rq\x05(K\x00K\x00K\x00K\x02KCC\x08t\x00t\x01\x1b\x00S\x00q\x06N\x85q\x07X\x01\x00\x00\x00aq\x08X\x01\x00\x00\x00bq\t\x86q\n)X\x07\x00\x00\x00<stdin>q\x0bX\x08\x00\x00\x00<lambda>q\x0cK\x01C\x00q\r))tq\x0eRq\x0f}q\x10(h\x08K\x01h\tK\x02uh\x0cNN}q\x11tq\x12Rq\x13\x85q\x14Rq\x15(h\x08K\x03h\tK\x04u.'
    

    Then I cut-N-paste the string that's produced (instead of writing to a file... it's the same thing) into a new python session.

    $ python
    Python 3.6.9 (default, Jul  6 2019, 02:58:03) 
    [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dill
    >>> d = dill.loads(b'\x80\x03ccollections\ndefaultdict\nq\x00cdill._dill\n_create_function\nq\x01(cdill._dill\n_load_type\nq\x02X\x08\x00\x00\x00CodeTypeq\x03\x85q\x04Rq\x05(K\x00K\x00K\x00K\x02KCC\x08t\x00t\x01\x1b\x00S\x00q\x06N\x85q\x07X\x01\x00\x00\x00aq\x08X\x01\x00\x00\x00bq\t\x86q\n)X\x07\x00\x00\x00<stdin>q\x0bX\x08\x00\x00\x00<lambda>q\x0cK\x01C\x00q\r))tq\x0eRq\x0f}q\x10(h\x08K\x01h\tK\x02uh\x0cNN}q\x11tq\x12Rq\x13\x85q\x14Rq\x15(h\x08K\x03h\tK\x04u.')
    >>> d
    defaultdict(<function <lambda> at 0x101d8d048>, {'a': 3, 'b': 4})
    >>> d['a']
    3
    >>> d['c']
    0.5
    >>>
    

    IF you don't use the recurse setting, then you'd have to use:

    >>> f = lambda: 1/2
    >>> d = defaultdict(f)
    

    when you create the defaultdict.

    Of course, dill should be able to deal with the pointer references from a lambda better, but there are cases where it can't yet.