Search code examples
pythonpython-3.xdictionarydefaultdict

clarify behaviour: collections.defaultdict vs dict.setdefault


dict provides .setdefault(), which will allow you to assign values, of any type, to missing keys on the fly:

>>> d = dict()
>>> d.setdefault('missing_key', [])
[]
>>> d
{'missing_key': []}

Whereas, if you use defaultdict to accomplish the same task, then the default value is generated on demand whenever you try to access or modify a missing key:

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d['missing_key']
[]
>>> d
defaultdict(<class 'list'>, {'missing_key': []})

However, the following piece of code implemented with defaultdict raises a KeyError instead of creating the item with default value, {}:

trie = collections.defaultdict(dict)
for word in words:
    t = trie
    for c in word:
        t = t[c]
    t["*"] = word

Using .setdefault() works ok:

trie = {}
for word in words:
    t = trie
    for c in word:
        t = t.setdefault(c, {})
    t["*"] = word

checking before access, works ok too:

trie = {}
for word in words:
    t = trie
    for c in word:
        if c not in t:
           t[c] = {}
        t = t[c]
    t["*"] = word

What am I missing when using collections.defaultdict()?

NB I am trying to build a Trie structure out of a list of words. For example:

words = ["oath", "pea", "eat", "rain"]
trie = {'o': {'a': {'t': {'h': {'*': 'oath'}}}}, 'p': {'e': {'a': {'*': 'pea'}}}, 'e': {'a': {'t': {'*': 'eat'}}}, 'r': {'a': {'i': {'n': {'*': 'rain'}}}}}

Solution

  • In your first example, when you do t = t[c], t becomes a regular empty dict (because that's what you tell the defaultdict to generate in the definition of trie).

    Let's run through the loop with your example word "oath":

    1) t = trie, word = "oath"
    2) c = "o"
    3) t = t[c]
      3.1) evaluation of t[c] # "o" is not in trie, so trie generates an empty dict at key "o" and returns it to you
      3.2) assignment to t -> t is now the empty dict. If you were to run (t is trie["o"]), it would evaluate to True after this line
    4) c = "a"
    5) t = t[c]
      5.1) Evaluation of t[c] -> "a" is not in the dict t. This is a regular dict, raise KeyError.
    

    Unfortunately, I can't think of a way to use defaultdict here (but Marius could, see this answer), because of the arbitrary nesting of a Trie. You'd need to define the trie as a defaultdict that, in case of missing key, generates a default dict that itself generates a default dict in case of missing key, recursively until maximum depth (which, in principle, is unknown).

    IMO, the best way to implement this is with the setdefault as you did in your second example.