dict
provides .setdefault()
, which will allow you to assign values, of any type, to missing keys on the fly:
>>> d = dict()
>>> d.setdefault('missing_key', [])
[]
>>> d
{'missing_key': []}
Whereas, if you use defaultdict
to accomplish the same task, then the default value is generated on demand whenever you try to access or modify a missing key:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d['missing_key']
[]
>>> d
defaultdict(<class 'list'>, {'missing_key': []})
However, the following piece of code implemented with defaultdict
raises a KeyError
instead of creating the item with default value, {}
:
trie = collections.defaultdict(dict)
for word in words:
t = trie
for c in word:
t = t[c]
t["*"] = word
Using .setdefault()
works ok:
trie = {}
for word in words:
t = trie
for c in word:
t = t.setdefault(c, {})
t["*"] = word
checking before access, works ok too:
trie = {}
for word in words:
t = trie
for c in word:
if c not in t:
t[c] = {}
t = t[c]
t["*"] = word
What am I missing when using collections.defaultdict()
?
NB I am trying to build a Trie structure out of a list of words. For example:
words = ["oath", "pea", "eat", "rain"]
trie = {'o': {'a': {'t': {'h': {'*': 'oath'}}}}, 'p': {'e': {'a': {'*': 'pea'}}}, 'e': {'a': {'t': {'*': 'eat'}}}, 'r': {'a': {'i': {'n': {'*': 'rain'}}}}}
In your first example, when you do t = t[c], t becomes a regular empty dict
(because that's what you tell the defaultdict
to generate in the definition of trie
).
Let's run through the loop with your example word "oath"
:
1) t = trie, word = "oath"
2) c = "o"
3) t = t[c]
3.1) evaluation of t[c] # "o" is not in trie, so trie generates an empty dict at key "o" and returns it to you
3.2) assignment to t -> t is now the empty dict. If you were to run (t is trie["o"]), it would evaluate to True after this line
4) c = "a"
5) t = t[c]
5.1) Evaluation of t[c] -> "a" is not in the dict t. This is a regular dict, raise KeyError.
Unfortunately, I can't think of a way to use defaultdict
here (but Marius could, see this answer), because of the arbitrary nesting of a Trie. You'd need to define the trie as a defaultdict that, in case of missing key, generates a default dict that itself generates a default dict in case of missing key, recursively until maximum depth (which, in principle, is unknown).
IMO, the best way to implement this is with the setdefault
as you did in your second example.