Search code examples
pythoncombinationspython-itertoolscartesian-product

Combination of nested dictionaries with arbitrary lengths in python


I am looking for a function that will take a nested dictionary, and produce the combinations / product of the values.

My query is similar to the problem specified here, but I can't seem to adapt the answers their to fit my needs: Cartesian product of nested dictionaries of lists

I wish to have an input like this:

d = {
  "country": [1, 2],
  "health_state": [
    {"healthy": [1]},
    {"breast cancer": {"stage": [1, 2]}}
  ]
}

Produce an output as follows:

[
{{"country":1},{"health state":{"healthy":1}}},
{{"country":2},{"health state":{"healthy":1}}},
{{"country":1},{"health state":{"breast cancer":{"stage":1}}}},
{{"country":1},{"health state":{"breast cancer":{"stage":2}}}},
{{"country":2},{"health state":{"breast cancer":{"stage":1}}}},
{{"country":2},{"health state":{"breast cancer":{"stage":2}}}}
]

In this example, the output is a list of 'states' that a person can occupy

  • any two elements of a list (input) should not be in the same element of the returned list (output), e.g. someone cannot be in country 1 and country 2 simultaneously
  • all keys in a dictionary (input) should be returned in the same element of the list (output), e.g. someone is in country 1 and also in a health_state. If that health state is 'breast cancer' they are also in stage 1 or stage 2

I can envision a solution that requires lots of for loops, checking whether elements are dictionaries, lists or neither, but this seems inefficient, especially for deeply nested dictionaries. I suspect there is a more elegant solution using itertools.product and recursion perhaps?


Solution

  • You can use recursion with itertools.product:

    import itertools as it
    d = {'country': [1, 2], 'health_state': [{'healthy': [1]}, {'breast cancer': {'stage': [1, 2]}}]}
    def c_prod(d):
      if isinstance(d, list):
         for i in d:
            yield from ([i] if not isinstance(i, (dict, list)) else c_prod(i))
      else:
         for i in it.product(*map(c_prod, d.values())):
            yield dict(zip(d.keys(), i))
    
    print(list(c_prod(d)))
    

    Output:

    [{'country': 1, 'health_state': {'healthy': 1}}, 
     {'country': 1, 'health_state': {'breast cancer': {'stage': 1}}}, 
     {'country': 1, 'health_state': {'breast cancer': {'stage': 2}}}, 
     {'country': 2, 'health_state': {'healthy': 1}}, 
     {'country': 2, 'health_state': {'breast cancer': {'stage': 1}}}, 
     {'country': 2, 'health_state': {'breast cancer': {'stage': 2}}}]
    

    The output from the code above produces a list of dictionaries, but your desired output mirrors a list of list of dictionaries (list[list[dict]]), thus, a final transformation can be made:

    r = [[{j:k} for j, k in i.items()] for i in c_prod(d)]
    

    Output:

    [[{'country': 1}, {'health_state': {'healthy': 1}}], [{'country': 1}, {'health_state': {'breast cancer': {'stage': 1}}}], [{'country': 1}, {'health_state': {'breast cancer': {'stage': 2}}}], [{'country': 2}, {'health_state': {'healthy': 1}}], [{'country': 2}, {'health_state': {'breast cancer': {'stage': 1}}}], [{'country': 2}, {'health_state': {'breast cancer': {'stage': 2}}}]]