python dictionary defaultdict dictionary-missing

A forgiving dictionary

I am wondering how to create forgiving dictionary (one that returns a default value if a KeyError is raised).

In the following code example I would get a KeyError; for example

a = {'one':1,'two':2}
print a['three']

In order not to get one I would 1. Have to catch the exception or use get.

I would like to not to have to do that with my dictionary...

Solution

import collections
a = collections.defaultdict(lambda: 3)
a.update({'one':1,'two':2})
print a['three']

emits 3 as required. You could also subclass dict yourself and override __missing__, but that doesn't make much sense when the defaultdict behavior (ignoring the exact missing key that's being looked up) suits you so well...

Edit ...unless, that is, you're worried about a growing by one entry each time you look up a missing key (which is part of defaultdict's semantics) and would rather get slower behavior but save some memory. For example, in terms of memory...:

>>> import sys
>>> a = collections.defaultdict(lambda: 'blah')
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
99 6284

...the defaultdict, originally empty, now has the 99 previously-missing keys that we looked up, and takes 6284 bytes (vs. the 140 bytes it took when it was empty).

The alternative approach...:

>>> class mydict(dict):
...   def __missing__(self, key): return 3
... 
>>> a = mydict()
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
0 140

...entirely saves this memory overhead, as you see. Of course, performance is another issue:

$ python -mtimeit -s'import collections; a=collections.defaultdict(int); r=xrange(99)' 'for i in r: _=a[i]'
100000 loops, best of 3: 14.9 usec per loop

$ python -mtimeit -s'class mydict(dict):
>   def __missing__(self, key): return 0
> ' -s'a=mydict(); r=xrange(99)' 'for i in r: _=a[i]'
10000 loops, best of 3: 92.9 usec per loop

Since defaultdict adds the (previously-missing) key on lookup, it gets much faster when such a key is next looked up, while mydict (which overrides __missing__ to avoid that addition) pays the "missing key lookup overhead" every time.

Whether you care about either issue (performance vs memory footprint) entirely depends on your specific use case, of course. It is in any case a good idea to be aware of the tradeoff!-)