Search code examples
pythondefaultdict

Why does defaultdict default_factory default to None?


You don't have to specify a default factory (but it's the same if you pass None explicitly)

>>> from collections import defaultdict
>>> defaultdict()
defaultdict(None, {})
>>> defaultdict(None)
defaultdict(None, {})

Why None though? Then we get this thing:

>>> dd = defaultdict()
>>> dd[0]
# TypeError: 'NoneType' object is not callable  <-- expected behaviour
# KeyError: 0                                   <-- actual behaviour

It's even explicitly allowed, because if you try to make a default dict from some other object, defaultdict(0) say, there is a failing check

TypeError: first argument must be callable or None

I thought something like lambda: None would be a better default factory. Why is the default_factory optional? I don't understand the use-case.


Solution

  • When Guido van Rossum initially proposed a DefaultDict it had a default value (unlike the current defaultdict which uses a callable rather than a value) that was set during construction and was read-only (also unlike defaultdict).

    After some discussion Guidio revised the proposal. Here are the relevant highlights:

    Many, many people suggested to use a factory function instead of a default value. This is indeed a much better idea (although slightly more cumbersome for the simplest cases).

    ...

    Let's add a generic missing-key handling method to the dict class, as well as a default_factory slot initialized to None.

    ...

    [T]he default implementation is designed so that we can write

    d = {}
    d.default_factory = list
    

    The important thing to note is that the new functionality no longer belongs to a subclass. That means that setting the default_factory in the constructor would break existing code. So by design setting the default_factory had to happen after the dict was created. It's initial value is set to None and it's now a mutable attribute so that it can be meaningfully overwritten.

    After yet more discussion, it was decided that maybe it would be best not to complicate the regular dict type with a defaultdict specialization.

    Steven Bethard then asked for clarification regarding the constructor:

    Should default_factory be an argument to the constructor? The three answers I see:

    • "No." I'm not a big fan of this answer. Since the whole point of creating a defaultdict type is to provide a default, requiring two statements (the constructor call and the default_factory assignment) to initialize such a dictionary seems a little inconvenient.
    • "Yes and it should be followed by all the normal dict constructor arguments." This is okay, but a few errors, like defaultdict({1:2}) will pass silently (until you try to use the dict, of course).
    • "Yes and it should be the only constructor argument." This is my favorite mainly because I think it's simple, and I couldn't think of good examples where I really wanted to do defaultdict(list, some_dict_or_iterable) or defaultdict(list, **some_keyword_args). It's also forward compatible if we need to add some of the dict constructor args in later.

    Guido van Rossum decided that:

    The defaultdict signature takes an optional positional argument which is the default_factory, defaulting to None. The remaining positional and all keyword arguments are passed to the dict constructor. IOW:

    d = defaultdict(list, [(1, 2)])
    

    is equivalent to:

    d = defaultdict()  
    d.default_factory = list  
    d.update([(1, 2)])
    

    Note that the expanded code mirrors exactly how it worked when Guido was considering altering dict to provide the defaultdict behavior.

    He also provides some justifications upthread:

    Even if the default_factory were passed to the constructor, it still ought to be a writable attribute so it can be introspected and modified. A defaultdict that can't change its default factory after its creation is less useful.

    Bengt Richter explains why you might want a mutable default factory:

    My guess is that realistically default_factory will be used to make clean code for filling a dict, and then turning the factory off if it's to be passed into unknown contexts. Those contexts can then use old code to do as above, or if worth it can temporarily set a factory to do some work. Tightly coupled code I guess could pass factory-enabled dicts between each other.