Can I flatten a deeply nested Python dictionary which contains values with lists of more nested dictionaries?

I am working with a large xml file in which I have been trying to extract keys and values. The information in this file is very sensitive so I cannot share it. I started by using the xml library. However, after hours of frustration I discovered the xmltodict library. I used this library to convert my xml to a dictionary (something I am much more familiar with relative to xml).

import xmltodict

# convert xml to dictionary
dict_nested = xmltodict.parse(str_xml)

Now that the xml is a dictionary, I would like to flatten it because there are a large number of levels (I don't know how many levels), while creating key names that help me trace the path to their corresponding value. Thus, I tried:

from flatten_dict import flatten

# flatten dict_nested 
dict_flat = flatten(dict_nested)

The result may look something like this but with many more layers:

{'ID': '123',
 'info': [{'breed':'collie'}, 
          {'fur': [{'short':'no'}, 
                   {'color':[{'black':'no'},
                             {'brown':'yes'}]}]}]}

This worked well as my keys are tuples showing the path of layers. My values are either strings (i.e., the end result I for which I am looking) or lists of type OrderedDict.

Since each dictionary in each list needs to be flattened and I don't know how deep this goes I am trying to figure out a way of programmatically flattening all dictionaries until all keys correspond to a single value (i.e., not a list or dictionary).

Ideally, the output would look something like this:

{'ID':'123',
 'info_breed':'collie',
 'info_fur_short':'no',
 'info_fur_color_black':'no',
 'info_fur_color_brown':'yes'}

Sorry that I cannot share more of my output because of the sensitive information.

Solution

you can use a recursive approach by taking in consideration that your dicts values are strings or lists with other dicts:

dict_flat = {'ID': '123',
 'info': [{'breed':'collie'}, 
          {'fur': [{'short':'no'}, 
                   {'color':[{'black':'no'},
                             {'brown':'yes'}]}]}]}

def my_flatten(dict_flat, key_prefix=None):

    result = {}
    for k, v in dict_flat.items():
        key = f'{key_prefix}_{k}' if key_prefix is not None else k
        if isinstance(v, list):
            for d in v:
                result.update(my_flatten(d, key))
        else:
            result[key] = v
    return result

my_flatten(dict_flat)

output:

{'ID': '123',
 'info_breed': 'collie',
 'info_fur_short': 'no',
 'info_fur_color_black': 'no',
 'info_fur_color_brown': 'yes'}