Search code examples
pythonpandasdataframedictionarynested

How to extract nested dictionaries from dictionary into single dictionary?


I have a dictionary which contains some key-value pairs as strings, but some key-values are dictionaries. The data looks like this:

{'amount': 123,
 'baseUnit': 'test',
 'currency': {'code': 'EUR'},
 'dimensions': {'height': {'iri': 'http://www.example.com/data/measurement-height-12345',
                           'unitOfMeasure': 'm',
                           'value': 23},
                'length': {'iri': 'http://www.example.com/data/measurement-length-12345',
                           'unitOfMeasure': 'm',
                           'value': 8322},
                'volume': {'unitOfMeasure': '', 'value': 0},
                'weight': {'iri': 'http://www.example.com/data/measurement-weight-12345',
                           'unitOfMeasure': 'KG',
                           'value': 23},
                'width': {'iri': 'http://www.example.com/data/measurement-width-12345',
                          'unitOfMeasure': 'm',
                          'value': 1}},
 'exportListNumber': '1234',
 'iri': 'http://www.example.com/data/material-12345',
 'number': '12345',
 'orderUnit': 'sdf',
 'producerFormattedPID': '12345',
 'producerID': 'example',
 'producerNonFormattedPID': '12345',
 'stateID': 'm70',
 'typeID': 'FERT'}

for the dimensions and price keys, there are some nested dictionaries as values. How can I extract that data so that the final variable is a dictionary with only keys-values as strings. For the price, I would need something like: {'pricecurrencycode':'EUR','priceamount':123} instead of 'price': {'currency': {'code': 'EUR'}, 'amount': 123}. and the same happening to dimensions key->to extract all the nested dictionaries so that it could be easier to transform into a final dataframe.


Solution

  • You can define a recursive flatten function that gets called whenever the dictionary value is a dictionary.

    Assuming python>=3.9:

    def flatten(my_dict, prefix=""):
        res = {}
        for k, v in my_dict.items():
            if isinstance(v, dict):
                res |= flatten(v, prefix+k)
            else:
                res[prefix+k] = v
        return res
    

    A slightly more verbose option for older python versions:

    def flatten(my_dict, prefix=""):
        res = {}
        for k, v in my_dict.items():
            if isinstance(v, dict):
                for k_flat, v_flat in flatten(v, prefix+k).items():
                    res[k_flat] = v_flat
            else:
                res[prefix+k] = v
        return res