I was checking out Peter Norvig's code on how to write simple spell checkers. At the beginning, he uses this code to insert words into a dictionary.
def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model
What is the difference between a Python dict and the one that was used here? In addition, what is the lambda
for? I checked the API documentation here and it says that defaultdict is actually derived from dict but how does one decide which one to use?
The difference is that a defaultdict
will "default" a value if that key has not been set yet. If you didn't use a defaultdict
you'd have to check to see if that key exists, and if it doesn't, set it to what you want.
The lambda is defining a factory for the default value. That function gets called whenever it needs a default value. You could hypothetically have a more complicated default function.
Help on class defaultdict in module collections:
class defaultdict(__builtin__.dict)
| defaultdict(default_factory) --> dict with default factory
|
| The default factory is called without arguments to produce
| a new value when a key is not present, in __getitem__ only.
| A defaultdict compares equal to a dict with the same items.
|
(from help(type(collections.defaultdict()))
)
{}.setdefault
is similar in nature, but takes in a value instead of a factory function. It's used to set the value if it doesn't already exist... which is a bit different, though.