Search code examples
pythonpython-3.xreducefunctools

Why does reducing getitem over a nested data structure fail?


Setup: I wanted to write a method that would take a nested data object and a path string, and attempt to use the path components to dereference a location inside the data object.

For example, you'd have a path like /alpha/bravo/0/charlie, and the method would return data_obj['alpha']['bravo'][0]['charlie'] if that was a defined location, or do something else (raise an exception, log a warning, return None, whatever) if it wasn't.

Attempt: I felt like there was probably a fairly simple way to do this, and when I looked around I found this answer, which suggests combining functools.reduce with operator.getitem to traverse an arbitrarily deep dictionary. I wanted to adapt that to cover a dict that could have nested lists, so I played around a bit and discovered that nested getitem calls work fine, but the combination of getitem and reduce results in a confusing bit of type mismatching, as demonstrated below.

Question: In the code snippet shown below, why does the reduce call result in an exception, when the other ways of making the nested calls do not?

My unsubstantiated guess: something in either functools or operator sets the getitem identifier to point at *either* list.__getitem__ OR dict.__getitem__, and when asked to play nice with reduce it gets stuck on one or the other and can't switch back and forth.

Code:

$ python3 -q
>>> data_obj = { 
...     'alpha': { 
...         'bravo': [
...             {'charlie': 1}, 
...             {'delta': 2},
...         ]
...     }
... }
>>> 
>>> node_keys = ['alpha', 'bravo', 0, 'charlie']
>>> 
>>> from functools import reduce
>>> from operator import getitem
>>> 
>>> reduce(getitem, data_obj, node_keys)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not str
>>> 
>>> data_obj[node_keys[0]][node_keys[1]][node_keys[2]][node_keys[3]]
1
>>> getitem(
...     getitem(
...         getitem(
...             getitem(data_obj, node_keys[0]),
...             node_keys[1]
...         ), node_keys[2]
...     ), node_keys[3]
... )
1
>>>
>>> data_obj.__getitem__(node_keys[0])\
...         .__getitem__(node_keys[1])\
...         .__getitem__(node_keys[2])\
...         .__getitem__(node_keys[3])
1
>>>

Solution

  • So, it should be reduce(getitem, node_keys, data_obj)

    The signature of reduce is def reduce(function, sequence, initial=None) where initial is the third argument. Your object is an initial.