I am working with a multi-index (country, year) dataframe in pandas containing GDP data in local currency units at current prices, eg
gdp
country year
AUS 2013 274865000000
2012 269562000000
2011 251727000000
2010 233604000000
2009 221002000000
2008 202260000000
USA 2013 2550000000000
... ...
I would like to create a new column containing the GDP deflator with 2010 as the base year, eg
gdp gdpdef
country year
AUS 2013 274865000000 1.18
2012 269562000000 1.15
2011 251727000000 1.08
2010 233604000000 1.00
2009 221002000000 0.95
2008 202260000000 0.87
USA 2013 2550000000000 1.01
... ... ...
2010 2520000000000 1
.... ... ...
Where, to be explicit, each entry in "gdpdef" is given by the ratio of GDP for country i in year t to GDP for country i in year 2010.
For a single country with the single index (year) I am able to achieve this end with:
base_year = df.get_value(2010, "gdp")
df["gdpdef"] = df["gdp"].div(base_year)
However I am having having a lot of trouble replicating this operation over the two indices (country, year) in a succinct manner. Any help toward this end is much appreciated.
You can create dict
for map index by rename
or map column by map
.
Multiindex complicated it. So in first solution use reset_index
with level=1
for column year
and then filter by boolean indexing
with loc
for select column gdp
.
In second solution is no level=1
parameter, so all index levels are converted to columns. So then is necessary set_index
. Last was added values
, because different indexes in df1
and df
, so data are not aligned.
print (df)
gdp
country year
AUS 2013 274865000000
2012 269562000000
2011 251727000000
2010 233604000000
2009 221002000000
2008 202260000000
USA 2013 2550000000000
2010 2546000000000
df1 = df.reset_index(level=1)
d = df1.loc[df1['year'] == 2010, 'gdp'].to_dict()
print (d)
{'USA': 2546000000000, 'AUS': 233604000000}
df["gdpdef"] = df["gdp"].div(df1.rename(index=d).index)
print (df)
gdp gdpdef
country year
AUS 2013 274865000000 1.176628
2012 269562000000 1.153927
2011 251727000000 1.077580
2010 233604000000 1.000000
2009 221002000000 0.946054
2008 202260000000 0.865824
USA 2013 2550000000000 1.001571
2010 2546000000000 1.000000
df1 = df.reset_index()
d = df1[df1['year'] == 2010].set_index('country')['gdp'].to_dict()
print (d)
{'USA': 2546000000000, 'AUS': 233604000000}
df["gdpdef"] = df1["gdp"].div(df1['country'].map(d)).values
print (df)
gdp gdpdef
country year
AUS 2013 274865000000 1.176628
2012 269562000000 1.153927
2011 251727000000 1.077580
2010 233604000000 1.000000
2009 221002000000 0.946054
2008 202260000000 0.865824
USA 2013 2550000000000 1.001571
2010 2546000000000 1.000000
Solution with groupby
and custom function should by slowier in large DataFrame
:
def f(x):
x['gdpdef'] = x['gdp'].div(x.loc[x.index.get_level_values('year') == 2010, 'gdp'].item())
return x
df = df.groupby(level='country').apply(f)
print (df)
gdp gdpdef
country year
AUS 2013 274865000000 1.176628
2012 269562000000 1.153927
2011 251727000000 1.077580
2010 233604000000 1.000000
2009 221002000000 0.946054
2008 202260000000 0.865824
USA 2013 2550000000000 1.001571
2010 2546000000000 1.000000