Search code examples
pythonpandaseconomics

Computing GDP deflators over two indices


I am working with a multi-index (country, year) dataframe in pandas containing GDP data in local currency units at current prices, eg

                gdp
country year    
AUS     2013    274865000000    
        2012    269562000000    
        2011    251727000000    
        2010    233604000000    
        2009    221002000000    
        2008    202260000000    
USA     2013    2550000000000
        ...     ...

I would like to create a new column containing the GDP deflator with 2010 as the base year, eg

                gdp             gdpdef
country year    
AUS     2013    274865000000    1.18
        2012    269562000000    1.15
        2011    251727000000    1.08
        2010    233604000000    1.00
        2009    221002000000    0.95
        2008    202260000000    0.87
USA     2013    2550000000000   1.01
        ...     ...             ...
        2010    2520000000000   1
        ....    ...             ...

Where, to be explicit, each entry in "gdpdef" is given by the ratio of GDP for country i in year t to GDP for country i in year 2010.

For a single country with the single index (year) I am able to achieve this end with:

base_year = df.get_value(2010, "gdp")
df["gdpdef"] = df["gdp"].div(base_year)

However I am having having a lot of trouble replicating this operation over the two indices (country, year) in a succinct manner. Any help toward this end is much appreciated.


Solution

  • You can create dict for map index by rename or map column by map.

    Multiindex complicated it. So in first solution use reset_index with level=1 for column year and then filter by boolean indexing with loc for select column gdp.

    In second solution is no level=1 parameter, so all index levels are converted to columns. So then is necessary set_index. Last was added values, because different indexes in df1 and df, so data are not aligned.

    print (df)
                            gdp
    country year               
    AUS     2013   274865000000
            2012   269562000000
            2011   251727000000
            2010   233604000000
            2009   221002000000
            2008   202260000000
    USA     2013  2550000000000
            2010  2546000000000
    
    df1 = df.reset_index(level=1)
    d = df1.loc[df1['year'] == 2010, 'gdp'].to_dict()
    print (d)
    {'USA': 2546000000000, 'AUS': 233604000000}
    
    df["gdpdef"] = df["gdp"].div(df1.rename(index=d).index)
    print (df)
                            gdp    gdpdef
    country year                         
    AUS     2013   274865000000  1.176628
            2012   269562000000  1.153927
            2011   251727000000  1.077580
            2010   233604000000  1.000000
            2009   221002000000  0.946054
            2008   202260000000  0.865824
    USA     2013  2550000000000  1.001571
            2010  2546000000000  1.000000
    

    df1 = df.reset_index()
    d = df1[df1['year'] == 2010].set_index('country')['gdp'].to_dict()
    print (d)
    {'USA': 2546000000000, 'AUS': 233604000000}
    
    df["gdpdef"] = df1["gdp"].div(df1['country'].map(d)).values
    print (df)
                            gdp    gdpdef
    country year                         
    AUS     2013   274865000000  1.176628
            2012   269562000000  1.153927
            2011   251727000000  1.077580
            2010   233604000000  1.000000
            2009   221002000000  0.946054
            2008   202260000000  0.865824
    USA     2013  2550000000000  1.001571
            2010  2546000000000  1.000000
    

    Solution with groupby and custom function should by slowier in large DataFrame:

    def f(x):
        x['gdpdef'] = x['gdp'].div(x.loc[x.index.get_level_values('year') == 2010, 'gdp'].item())
        return x
    
    df = df.groupby(level='country').apply(f)
    print (df)
                            gdp    gdpdef
    country year                         
    AUS     2013   274865000000  1.176628
            2012   269562000000  1.153927
            2011   251727000000  1.077580
            2010   233604000000  1.000000
            2009   221002000000  0.946054
            2008   202260000000  0.865824
    USA     2013  2550000000000  1.001571
            2010  2546000000000  1.000000