Search code examples
pythonpandasmulti-index

Column Differences for MultiIndex Dataframes


I've got (probably) quite a simple problem that I just cannot wrap my head around right now. I'm collecting the following two series:

from pandas_datareader import wb
countries = [
            'DZA', 'ARM','AZE','BLR','BIH','BRN','KHM','CHN','HRV', 'CZE','EGY',\
             'EST','GEO','HUN','IND','IDN','ISR','JPN','JOR','KAZ','KOR','KGZ','LAO','LVA',\
             'LBN','LTU','MYS','MDA','MNG','MMR','MKD','PHL','POL','ROU', 'RUS','SAU',\
             'SGP','SVK','SVN','TJK','THA','TUR','UKR','UZB','VNM'
]
dat = wb.download(indicator='FR.INR.LEND', country=countries, start=2010, end=2019)
dat.columns = ['lending_rate']
us = wb.download(indicator='FR.INR.LEND', country='US', start=2010, end=2019)
us.columns = ['lending_rate_us']
dat2=pd.concat([dat,us])
dat2

I'd like to take the difference between lending_rate and lending_rate_us but obviously would like to subtract lending_rate_us for the US only from lending_rate in all other countries (ie. avoid what would otherwise lead to NANs everywhere).

So I guess what I'm trying to do is to copy the values for lending_rate_us to all other countries to then take the difference between both columns.

Does anyone have an idea how to do that (or an alternative idea that makes more sense)?

Thanks!

EDIT:

I tried the following, alas without success:

from pandas_datareader import wb
countries = [
            'DZA', 'ARM','AZE','BLR','BIH','BRN','KHM','CHN','HRV', 'CZE','EGY',\
             'EST','GEO','HUN','IND','IDN','ISR','JPN','JOR','KAZ','KOR','KGZ','LAO','LVA',\
             'LBN','LTU','MYS','MDA','MNG','MMR','MKD','PHL','POL','ROU', 'RUS','SAU',\
             'SGP','SVK','SVN','TJK','THA','TUR','UKR','UZB','VNM'
]
dat = wb.download(indicator='FR.INR.LEND', country=countries, start=2010, end=2019)
dat.columns = ['lending_rate']
us = wb.download(indicator='FR.INR.LEND', country='US', start=2010, end=2019)
us.columns = ['lending_rate']

for i in dat.index.get_level_values(0).unique():
    dat["lending_rate_spread"]=dat.loc[i,:]-us.loc["United States",:]
dat

Output:

        lending_rate    lending_rate_spread
country year        
Armenia 
2019    12.141989   NaN
2018    12.793042   NaN
2017    14.406002   NaN
2016    17.356706   NaN
2015    17.590330   NaN
... ... ... ...
Vietnam 
2014    8.665000    NaN
2013    10.374167   NaN
2012    13.471667   NaN
2011    16.953833   NaN
2010    13.135250   NaN
450 rows × 2 columns

But when I just print the result of the loop without creating a new column I get the correct values:

for i in dat.index.get_level_values(0):
    print(dat.loc[i,:]-us.loc["United States",:])

Output:

      lending_rate
year              
2019      6.859489
2018      7.888875
2017     10.309335
2016     13.845039
2015     14.330330
2014     13.158665
2013     12.744987
2012     13.980068
2011     14.504474
2010     15.950428
      lending_rate
year              
2019     11.998715
2018     12.544167
2017     12.445833
2016     12.863560
2015     14.274167

I don't understand why I would get the correct result but not present it in the correct way?


Solution

  • In response to your comment, I reviewed the data again. I reworked the data for each country as NA data existed, and found that all of the data was for 10 years. The method @Paul commented on is possible, so I modified the code.

    dat['lending_rate_us'] = us['lending_rate_us']*len(dat['country'].unique())
    dat['lending_rate_spread'] = dat['lending_rate'] - us['lending_rate_us']
    dat.head()
    country     year    lending_rate    lending_rate_us     lending_rate_spread
    0   Armenia     2019    12.141989   237.7125    6.859489
    1   Armenia     2018    12.793042   220.6875    7.888875
    2   Armenia     2017    14.406002   184.3500    10.309335
    3   Armenia     2016    17.356706   158.0250    13.845039
    4   Armenia     2015    17.590330   146.7000    14.330330