Search code examples
pythonpivotpandastime-seriestabular

Pandas DataFrame from WB WDI data: combine year columns into "year" variable and merge rows


I have a dataset (.tsv file) with the following columns. (It's the World Bank's new WDI all-in all-time single-download dataset. Nice!)

country countrycode varname 1960 1961 1962
afghanistan AFG GDP 5.6 5.7 5.8
afghanistan AFG Gini .77 .78 .75
afghanistan AFG educ 8.1 8.2 8.3
afghanistan AFG pop 888 889 890
albania ALB GDP 6.6 6.7 6.8
albania ALB Gini .45 .46 .47
albania ALB educ 6.2 6.3 6.4
albania ALB pop 777 778 779

I need a pandas DataFrame with ['GDP','Gini','edu','pop'] as columns, along with ['country', 'countrycode', 'year']. So the values for "year" are currently columns! And I'd like there to be only one row for each country-year combination.

For instance, the columns and first row would be

country countrycode year GDP Gini educ pop
afghanistan AFG 1960 5.6 .77 8.1 888

This seems like some complex pivot or opposite-of-"melt", but I cannot figure it out.


Solution

  • In [59]: df
    Out[59]:
           country countrycode varname    1960    1961    1962
    0  afghanistan         AFG     GDP    5.60    5.70    5.80
    1  afghanistan         AFG    Gini    0.77    0.78    0.75
    2  afghanistan         AFG    educ    8.10    8.20    8.30
    3  afghanistan         AFG     pop  888.00  889.00  890.00
    4      albania         ALB     GDP    6.60    6.70    6.80
    5      albania         ALB    Gini    0.45    0.46    0.47
    6      albania         ALB    educ    6.20    6.30    6.40
    7      albania         ALB     pop  777.00  778.00  779.00
    
    In [60]: df = df.set_index(['country', 'countrycode', 'varname'])
    
    In [61]: df.columns.name = 'year'
    
    In [62]: df.stack().unstack('varname')
    Out[62]:
    varname                       GDP  Gini  educ  pop
    country     countrycode year
    afghanistan AFG         1960  5.6  0.77   8.1  888
                            1961  5.7  0.78   8.2  889
                            1962  5.8  0.75   8.3  890
    albania     ALB         1960  6.6  0.45   6.2  777
                            1961  6.7  0.46   6.3  778
                            1962  6.8  0.47   6.4  779
    

    The latter is a frame with a MutliIndex, you can do reset_index to move the MultiIndex to regular columns.