Search code examples
pythondictionary

What is the difference between dict and collections.defaultdict?


I was checking out Peter Norvig's code on how to write simple spell checkers. At the beginning, he uses this code to insert words into a dictionary.

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model

What is the difference between a Python dict and the one that was used here? In addition, what is the lambda for? I checked the API documentation here and it says that defaultdict is actually derived from dict but how does one decide which one to use?


Solution

  • The difference is that a defaultdict will "default" a value if that key has not been set yet. If you didn't use a defaultdict you'd have to check to see if that key exists, and if it doesn't, set it to what you want.

    The lambda is defining a factory for the default value. That function gets called whenever it needs a default value. You could hypothetically have a more complicated default function.

    Help on class defaultdict in module collections:
    
    class defaultdict(__builtin__.dict)
     |  defaultdict(default_factory) --> dict with default factory
     |  
     |  The default factory is called without arguments to produce
     |  a new value when a key is not present, in __getitem__ only.
     |  A defaultdict compares equal to a dict with the same items.
     |  
    

    (from help(type(collections.defaultdict())))

    {}.setdefault is similar in nature, but takes in a value instead of a factory function. It's used to set the value if it doesn't already exist... which is a bit different, though.