Search code examples
pythonpandasdictionaryunpack

How can I unpack a nested dictionary where not every top level key has all of the second level keys?


How can I unpack a nested dictionary where not every top level key has all of the second level keys?

I scraped data on properties from a website. The website offered up to 7 attributes for each property, but the attribute types varied between properties (i.e. "Land" property types don't show "Building Size" as an attribute because there is no building).

As a first step to get around this, I scraped both the attribute type and the value as separate columns and translated the data into a dictionary form where each property has a unique ID_Number and a series of key:value pairs. Now I want to unpack that dictionary into a data frame where the column headers would represent all of the possible second level keys (attribute types) and the column values would be the "value" associated with the attribute key.

An example of the data is shown below:

{1: [{'Status:': 'For Lease',
   'Price:': '$3.17 SF/Mo',
   'Property Type:': 'Retail',
   'Sub-Type:': 'Office, Retail',
   'Spaces:': '2 Spaces',
   'Space Available:': '0.00 - 0.03 AC',
   'Building Size:': '9,161 SF'}],
 2: [{'Status:': 'For Lease',
   'Price:': '$1.25 SF/Mo',
   'Property Type:': 'Office',
   'Sub-Type:': 'Office',
   'Spaces:': '1 Space',
   'Space Available:': '0.03 AC',
   'Building Size:': '11,332 SF'}],
 3: [{'Status:': 'For Sale',
   'Price:': 2521740,
   'Property Type:': 'Retail',
   'Sub-Type:': 'Fast Food',
   'Building Size:': '2,410 SF',
   'Cap Rate:': 0.0575,
   'Lot Size:': '76,666 SF'}],
 4: [{'Status:': 'For Lease',
   'Price:': '$0.63 SF/Mo',
   'Property Type:': 'Retail',
   'Sub-Type:': 'Retail',
   'Spaces:': '1 Space',
   'Space Available:': '0.50 AC',
   'Building Size:': '59,095 SF'}],

How would I go about extracting this? I have tried several variations on from_dict, but haven't found a solution that works.

Thanks in advance!


Solution

  • There are a few ways you could do this. I'm not a pandas expert, so there may be a more elegant solution. But here's how I would do it quick and dirty (BTW, you have 9 unique attributes in the sample data you provided, not 7). This will automatically handle missing values by making them NaN:

    import pandas as pd
    
    data = {1: [{'Building Size:': '9,161 SF',
                  'Price:': '$3.17 SF/Mo',
                  'Property Type:': 'Retail',
                  'Space Available:': '0.00 - 0.03 AC',
                  'Spaces:': '2 Spaces',
                  'Status:': 'For Lease',
                  'Sub-Type:': 'Office, Retail'}],
             2: [{'Building Size:': '11,332 SF',
                  'Price:': '$1.25 SF/Mo',
                  'Property Type:': 'Office',
                  'Space Available:': '0.03 AC',
                  'Spaces:': '1 Space',
                  'Status:': 'For Lease',
                  'Sub-Type:': 'Office'}],
             3: [{'Building Size:': '2,410 SF',
                  'Cap Rate:': 0.0575,
                  'Lot Size:': '76,666 SF',
                  'Price:': 2521740,
                  'Property Type:': 'Retail',
                  'Status:': 'For Sale',
                  'Sub-Type:': 'Fast Food'}],
             4: [{'Building Size:': '59,095 SF',
                  'Price:': '$0.63 SF/Mo',
                  'Property Type:': 'Retail',
                  'Space Available:': '0.50 AC',
                  'Spaces:': '1 Space',
                  'Status:': 'For Lease',
                  'Sub-Type:': 'Retail'}],
            }
    
    df = pd.DataFrame()
    for property_num, property_list in data.items():
        for property_dict in property_list:  # you only have one per list, so this isn't really needed
            df = df.append(property_dict, True)
    df.index = data.keys()
    
    
    
    >>> print(df)
      Building Size:       Price:  ... Cap Rate:  Lot Size:
    1       9,161 SF  $3.17 SF/Mo  ...       NaN        NaN
    2      11,332 SF  $1.25 SF/Mo  ...       NaN        NaN
    3       2,410 SF      2521740  ...    0.0575  76,666 SF
    4      59,095 SF  $0.63 SF/Mo  ...       NaN        NaN
    
    [4 rows x 9 columns]