Search code examples
jsonpandasdataframenormalize

Pandas for reading nested json with json_normalize or read_json


I have a json object with the following structure:

import pandas as pd
json_data_raw = [{"indicator_value":{"195606": 
                                              {"2010":{"AFG": 0.29, 
                                                       "ZWE": 0.607}, 
                                               "2011": {"AFG": 0.406, 
                                                       "ZWE": 0.737}, 
                                               "2012": {"AFG": 0.345, 
                                                       "ZWE": 0.587}, 
                                               "2013": {"AFG": 0.28, 
                                                       "ZWE": 0.871}, 
                                               "2014": {"AFG": 0.253, 
                                                       "ZWE": 0.88}, 
                                               "2015": {"AFG": 0.262, 
                                                       "ZWE": 0.88}, 
                                               "2016": {"AFG": 0.245, 
                                                       "ZWE": 0.77}, 
                                               "2017": {"AFG": 0.247, 
                                                       "ZWE": 0.845}, 
                                               "2018": {"AFG": 0.254, 
                                                       "ZWE": 0.849}}}, 
                  "country_name": {"AFG": "Afghanistan", 
                                   "ZWE": "Zimbabwe"}, 
                  "indicator_name": {"195606": "Carbon dioxide emissions, production emissions per capita (tonnes)"}}]

When I try to read this structure with pd.read_json method I manage to load only the first node of the data, namely "indicator_value", "country_name" and "indicator_name" with the nested instances only.

I tried also to use pd.json_normalize and no success neither since I don't quite get how to specify the parameters record_path or meta (metadata) contained in my json object and getting odd results.

Ideally I would like to get a table like this. enter image description here

and so on for the other country/year/values...

Probably this is more than a single task and will involve other kind of object manipulation. In any case I much appreciate some help.


Solution

  • Firstly create a variable named d....basically it contains the inner dictionary

    d=json_data_raw[0]
    

    Now create a dataframe df:-

    df=pd.DataFrame(list(list(d.values())[0].values())[0])
    

    so here list(list(d.values())[0].values())[0] have 3 level deeper dictionary

    now,

    df=df.stack().to_frame()
    

    Note:-here you can overwrite df or create a new variable the above code make a multi index dataframe so to remove multi index we use:-

    df.reset_index(inplace=True)
    

    now to rename columns:-

    df.columns=['country_code','year','Carbon dioxide emssions,production emission per capita(tonnes)']
    

    Now we will create a function that will change the country_code column to country name i.e

    def country(val):
        if val=='AFG':
            return 'Afganistan'
        else:
            return 'Zimbabwe'
    

    Now we will use apply() method

    country=df['country_code'].apply(country)
    

    Now finally we will insert the country column in our dataframe df

    df.insert(0,'country_name',country)
    

    Now if you print df you will get your desired output

    output:-

        country_name    country_code    year    Carbon dioxide emssions,production emission per capita(tonnes)
    0   Afganistan               AFG    2010    0.290
    1   Afganistan               AFG    2011    0.406
    2   Afganistan               AFG    2012    0.345
    3   Afganistan               AFG    2013    0.280
    4   Afganistan               AFG    2014    0.253
    5   Afganistan               AFG    2015    0.262
    6   Afganistan               AFG    2016    0.245
    7   Afganistan               AFG    2017    0.247
    8   Afganistan               AFG    2018    0.254
    9   Zimbabwe                 ZWE    2010    0.607
    10  Zimbabwe                 ZWE    2011    0.737
    11  Zimbabwe                 ZWE    2012    0.587
    12  Zimbabwe                 ZWE    2013    0.871
    13  Zimbabwe                 ZWE    2014    0.880
    14  Zimbabwe                 ZWE    2015    0.880
    15  Zimbabwe                 ZWE    2016    0.770
    16  Zimbabwe                 ZWE    2017    0.845
    17  Zimbabwe                 ZWE    2018    0.849
    

    Note:- well see I create a function because in your json object there are only 2 country and if you have country more than 2 and you have this same json format then the code before I define/create function country() work as it is.

    but then instead of making function and then using apply() method use this:-

    countryinfo=list(json_data_raw[0].values())[1]
    

    Now if you print countryinfo you get a dictionary in which keys are country code and its values are country name

    Output of countryinfo:-

    {'AFG': 'Afghanistan', 'ZWE': 'Zimbabwe'}
    

    well by list(json_data_raw[0].values())[1] we are grabbing the data of "country_name" which is stored inside your json object:- json_data_raw and storing it in a variable named country info

    so now

    country=df['country_code'].replace(countryinfo.keys(),countryinfo.values())
    

    so here basically we are replacing the value of df['country_code'] with the values of countryinfo dictionary so it will returns a Series of values and we are storing that Series in variable named country

    and finally no we insert 'country_name' column at index 0 in our dataframe df

    df.insert(0,'country_name',country)
    

    Now if you print df then you will get your desired output regardless of number of countries you have in your json_data_raw

    Output of df:-

        country_name    country_code    year    Carbon dioxide emssions,production emission per capita(tonnes)
    0   Afganistan               AFG    2010    0.290
    1   Afganistan               AFG    2011    0.406
    2   Afganistan               AFG    2012    0.345
    3   Afganistan               AFG    2013    0.280
    4   Afganistan               AFG    2014    0.253
    5   Afganistan               AFG    2015    0.262
    6   Afganistan               AFG    2016    0.245
    7   Afganistan               AFG    2017    0.247
    8   Afganistan               AFG    2018    0.254
    9   Zimbabwe                 ZWE    2010    0.607
    10  Zimbabwe                 ZWE    2011    0.737
    11  Zimbabwe                 ZWE    2012    0.587
    12  Zimbabwe                 ZWE    2013    0.871
    13  Zimbabwe                 ZWE    2014    0.880
    14  Zimbabwe                 ZWE    2015    0.880
    15  Zimbabwe                 ZWE    2016    0.770
    16  Zimbabwe                 ZWE    2017    0.845
    17  Zimbabwe                 ZWE    2018    0.849