Search code examples
pythonpandasdictionarynested

building a df from nested lists and dictionaries in a more Pythonic DRY way


i have a list with dictionaries in which some dictionaries have a list of dictionaries too:

samples = [
    {"person": "A", "employed": True, "location": "East", 
    "donations": []},
    {"person": "B", "employed": False, "location": "West", 
    "donations": [
                 {"type": "cash", "date": "july 25, 2022", "value": 50},
                 {"type": "stocks", "date": "May 1, 2022", "value": 2500}
                 ]},
    {"person": "C", "employed": False, "location": "North",
    "donations": []}
]

I wanted process this data to create a dataframe that has flattened the nested dictionaries and added new column headers like this:

enter image description here

Here's the code that I have have, but it has a lot of repetition. What is an approach to make it more Pythonic and DRY:

data_list = [] ## initial empty list
for sample in samples:
    ## get data that is not nested
    person = sample.get("person")
    employed = sample.get("employed")
    location = sample.get("location")
    ## if there is not nested data, give specific values
    if len(sample.get("donations")) < 1:
        donation_type = donation_date =  donation_value = "NONE LISTED"
        ## build list
        data_list.append({"person": person,
                  "employed": employed,
                  "location": location,
                  "donation_type": donation_type,
                  "donation_date": donation_date,
                  "donation_value": donation_value,
                 })
    else:
        ## build same list but with provided values
        donations = sample.get("donations")
        for donation in donations:
            data_list.append({"person": person,
                "employed": employed,
                "location": location,
                "donation_type": donation.get("type"),
                "donation_date": donation.get("date"),
                "donation_value": donation.get("value")
                })


Solution

  • I like pandas.json_normalize for this because you don't need to supply the column names:

    df = pd.DataFrame(samples).drop('donations', axis=1)
    donations = pd.json_normalize(samples, record_path='donations', meta=['person'])
    pd.merge(df, donations, how= 'left', on = 'person').apply(lambda x: x.fillna("NONE LISTED"))
    

    Output:

    person  employed    location    type    date    value
    0   A   True    East    NONE LISTED NONE LISTED NONE LISTED
    1   B   False   West    cash    july 25, 2022   50
    2   B   False   West    stocks  May 1, 2022 2500
    3   C   False   North   NONE LISTED NONE LISTED NONE LISTED