I'm using nested dictionaries as implemented using the AutoVivification class answer at What is the best way to implement nested dictionaries?; namely
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
a = AutoVivification()
a['foo']['bar'] = 'spam'
thus permitting arbitrary nesting in the dictionary. Is there a way to modify the class such that one could assign values to members using an arbitrary set of keys, but only previously defined sets of keys are allowed when trying to access / read from the member? For example,
print a['foo']['bar']
print a['foo']['eggs']
currently outputs
spam
{}
It would be great if the second one gave an error instead, as a['foo']['eggs'] hasn't been defined...
The problem you'll run into is that, in order to set an item on a nested dictionary, you must first be able to get all the parent items. For example:
d[1][2][3] = 42
requires getting d[1][2]
in order to set d[1][2][3]
. There's no way to know whether an assignment is in process when you are accessing the intermediate dictionaries, so the only way to get assignment to work is to always create the sub-dictionary on access. (You could return some kind of proxy object instead of creating the sub-dictionary, and defer creation of the intermediate dictionaries until assignment, but you're still not going to get an error when you access a path that doesn't exist.)
The simplest way around this is to use a single tuple key rather than repeated subkeys. In other words, instead of setting d[1][2][3]
you would set d[1, 2, 3]
. Assignments then are self-contained operations: they do not require getting any intermediate nesting levels, so you can create the intermediate levels only on assignment.
As a bonus, you may find working with a tuple much more straightforward when passing multiple keys around, since you can just stick 'em in the []
and get the item you want.
You could do this with a single dictionary, using the tuples as keys. However, this loses the hierarchical structure of the data. The implementation below uses sub-dictionaries. A dictionary subclass called node
is used so that we can assign an attribute on the dictionary to represent the value of the node at that location; this way, we can store values on the intermediate nodes as well as at the leaves. (It has a __repr__
method that shows both the node's value and its children, if it has any.) The __setitem__
method of the tupledict
class handles creating intermediate nodes when assigning an element. __getitem__
traverses the nodes to find the value you want. (If you want to access the individual nodes as nodes, you can use get()
to reach them one at a time.)
class tupledict(dict):
class node(dict):
def __repr__(self):
if self:
if hasattr(self, "value"):
return repr(self.value) + ", " + dict.__repr__(self)
return dict.__repr__(self)
else:
return repr(self.value)
def __init__(self):
pass
def __setitem__(self, key, value):
if not isinstance(key, tuple): # handle single value
key = [key]
d = self
for k in key:
if k not in d:
dict.__setitem__(d, k, self.node())
d = dict.__getitem__(d, k)
d.value = value
def __getitem__(self, key):
if not isinstance(key, tuple):
key = [key]
d = self
for k in key:
try:
d = dict.__getitem__(d, k)
except KeyError:
raise KeyError(key[0] if len(key) == 1 else key)
try:
return d.value
except AttributeError:
raise KeyError(key[0] if len(key) == 1 else key)
Usage:
td = tupledict()
td['foo', 'bar'] = 'spam'
td['foo', 'eggs'] # KeyError
key = 'foo', 'bar'
td[key] # 'spam'