Search code examples
pythondictionarynested

Nested dictionaries in python with error when accessing non-existent key


I'm using nested dictionaries as implemented using the AutoVivification class answer at What is the best way to implement nested dictionaries?; namely

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

a = AutoVivification()
a['foo']['bar'] = 'spam'

thus permitting arbitrary nesting in the dictionary. Is there a way to modify the class such that one could assign values to members using an arbitrary set of keys, but only previously defined sets of keys are allowed when trying to access / read from the member? For example,

print a['foo']['bar']
print a['foo']['eggs']

currently outputs

spam
{}

It would be great if the second one gave an error instead, as a['foo']['eggs'] hasn't been defined...


Solution

  • The problem you'll run into is that, in order to set an item on a nested dictionary, you must first be able to get all the parent items. For example:

    d[1][2][3] = 42
    

    requires getting d[1][2] in order to set d[1][2][3]. There's no way to know whether an assignment is in process when you are accessing the intermediate dictionaries, so the only way to get assignment to work is to always create the sub-dictionary on access. (You could return some kind of proxy object instead of creating the sub-dictionary, and defer creation of the intermediate dictionaries until assignment, but you're still not going to get an error when you access a path that doesn't exist.)

    The simplest way around this is to use a single tuple key rather than repeated subkeys. In other words, instead of setting d[1][2][3] you would set d[1, 2, 3]. Assignments then are self-contained operations: they do not require getting any intermediate nesting levels, so you can create the intermediate levels only on assignment.

    As a bonus, you may find working with a tuple much more straightforward when passing multiple keys around, since you can just stick 'em in the [] and get the item you want.

    You could do this with a single dictionary, using the tuples as keys. However, this loses the hierarchical structure of the data. The implementation below uses sub-dictionaries. A dictionary subclass called node is used so that we can assign an attribute on the dictionary to represent the value of the node at that location; this way, we can store values on the intermediate nodes as well as at the leaves. (It has a __repr__ method that shows both the node's value and its children, if it has any.) The __setitem__ method of the tupledict class handles creating intermediate nodes when assigning an element. __getitem__ traverses the nodes to find the value you want. (If you want to access the individual nodes as nodes, you can use get() to reach them one at a time.)

    class tupledict(dict):
    
        class node(dict):
            def __repr__(self):
                if self:
                    if hasattr(self, "value"):
                        return repr(self.value) + ", " + dict.__repr__(self)
                    return dict.__repr__(self)
                else:
                    return repr(self.value)
    
        def __init__(self):
            pass
    
        def __setitem__(self, key, value):
            if not isinstance(key, tuple):   # handle single value
                key = [key]
            d = self
            for k in key:
                if k not in d:
                    dict.__setitem__(d, k, self.node())
                d = dict.__getitem__(d, k)
            d.value = value
    
        def __getitem__(self, key):
            if not isinstance(key, tuple):
                key = [key]
            d = self
            for k in key:
                try:
                    d = dict.__getitem__(d, k)
                except KeyError:
                    raise KeyError(key[0] if len(key) == 1 else key)
            try:
                return d.value
            except AttributeError:
                raise KeyError(key[0] if len(key) == 1 else key)
    

    Usage:

    td = tupledict()
    td['foo', 'bar'] = 'spam'
    td['foo', 'eggs']   # KeyError
    
    key = 'foo', 'bar'
    td[key]    # 'spam'