I have a json object
with the following structure:
import pandas as pd
json_data_raw = [{"indicator_value":{"195606":
{"2010":{"AFG": 0.29,
"ZWE": 0.607},
"2011": {"AFG": 0.406,
"ZWE": 0.737},
"2012": {"AFG": 0.345,
"ZWE": 0.587},
"2013": {"AFG": 0.28,
"ZWE": 0.871},
"2014": {"AFG": 0.253,
"ZWE": 0.88},
"2015": {"AFG": 0.262,
"ZWE": 0.88},
"2016": {"AFG": 0.245,
"ZWE": 0.77},
"2017": {"AFG": 0.247,
"ZWE": 0.845},
"2018": {"AFG": 0.254,
"ZWE": 0.849}}},
"country_name": {"AFG": "Afghanistan",
"ZWE": "Zimbabwe"},
"indicator_name": {"195606": "Carbon dioxide emissions, production emissions per capita (tonnes)"}}]
When I try to read this structure with pd.read_json
method I manage to load only the first node of the data, namely "indicator_value"
, "country_name"
and "indicator_name"
with the nested instances only.
I tried also to use pd.json_normalize
and no success neither since I don't quite get how to specify the parameters record_path or meta (metadata) contained in my json object
and getting odd results.
Ideally I would like to get a table like this.
and so on for the other country/year/values...
Probably this is more than a single task and will involve other kind of object manipulation. In any case I much appreciate some help.
Firstly create a variable
named d
....basically it contains the inner dictionary
d=json_data_raw[0]
Now create a dataframe df
:-
df=pd.DataFrame(list(list(d.values())[0].values())[0])
so here list(list(d.values())[0].values())[0]
have 3 level deeper dictionary
now,
df=df.stack().to_frame()
Note:-here you can overwrite df
or create a new variable
the above code make a multi index dataframe
so to remove multi index
we use:-
df.reset_index(inplace=True)
now to rename columns:-
df.columns=['country_code','year','Carbon dioxide emssions,production emission per capita(tonnes)']
Now we will create a function that will change the country_code column to country name i.e
def country(val):
if val=='AFG':
return 'Afganistan'
else:
return 'Zimbabwe'
Now we will use apply()
method
country=df['country_code'].apply(country)
Now finally we will insert the country column in our dataframe df
df.insert(0,'country_name',country)
Now if you print df
you will get your desired output
output:-
country_name country_code year Carbon dioxide emssions,production emission per capita(tonnes)
0 Afganistan AFG 2010 0.290
1 Afganistan AFG 2011 0.406
2 Afganistan AFG 2012 0.345
3 Afganistan AFG 2013 0.280
4 Afganistan AFG 2014 0.253
5 Afganistan AFG 2015 0.262
6 Afganistan AFG 2016 0.245
7 Afganistan AFG 2017 0.247
8 Afganistan AFG 2018 0.254
9 Zimbabwe ZWE 2010 0.607
10 Zimbabwe ZWE 2011 0.737
11 Zimbabwe ZWE 2012 0.587
12 Zimbabwe ZWE 2013 0.871
13 Zimbabwe ZWE 2014 0.880
14 Zimbabwe ZWE 2015 0.880
15 Zimbabwe ZWE 2016 0.770
16 Zimbabwe ZWE 2017 0.845
17 Zimbabwe ZWE 2018 0.849
Note:- well see I create a function
because in your json
object there are only 2 country and if you have country more than 2 and you have this same json
format then the code before I define/create function country()
work as it is.
but then instead of making function
and then using apply()
method
use this:-
countryinfo=list(json_data_raw[0].values())[1]
Now if you print countryinfo
you get a dictionary
in which keys
are country code and its values
are country name
Output of countryinfo
:-
{'AFG': 'Afghanistan', 'ZWE': 'Zimbabwe'}
well by list(json_data_raw[0].values())[1]
we are grabbing the data of "country_name"
which is stored inside your json object:- json_data_raw
and storing it in a variable
named country info
so now
country=df['country_code'].replace(countryinfo.keys(),countryinfo.values())
so here basically we are replacing the value of df['country_code']
with the values of countryinfo dictionary
so it will returns a Series
of values and we are storing that Series
in variable
named country
and finally no we insert 'country_name'
column at index 0
in our dataframe df
df.insert(0,'country_name',country)
Now if you print df then you will get your desired output regardless of number of countries you have in your json_data_raw
Output of df
:-
country_name country_code year Carbon dioxide emssions,production emission per capita(tonnes)
0 Afganistan AFG 2010 0.290
1 Afganistan AFG 2011 0.406
2 Afganistan AFG 2012 0.345
3 Afganistan AFG 2013 0.280
4 Afganistan AFG 2014 0.253
5 Afganistan AFG 2015 0.262
6 Afganistan AFG 2016 0.245
7 Afganistan AFG 2017 0.247
8 Afganistan AFG 2018 0.254
9 Zimbabwe ZWE 2010 0.607
10 Zimbabwe ZWE 2011 0.737
11 Zimbabwe ZWE 2012 0.587
12 Zimbabwe ZWE 2013 0.871
13 Zimbabwe ZWE 2014 0.880
14 Zimbabwe ZWE 2015 0.880
15 Zimbabwe ZWE 2016 0.770
16 Zimbabwe ZWE 2017 0.845
17 Zimbabwe ZWE 2018 0.849