Search code examples
pythonpandasdataframemulti-index

How do I create a MultiIndex Dataframe from a merged table?


I am trying to run analysis on a merged dataframe. Currently, my data is something like this:

data2 = pd.DataFrame({'county':['A', 'B', 'C', 'D'],
                    '1990pop': [100, 200, 150, 400],
                     '1990price': [35, 20, 10, 60],
                     '1991pop': [110, 210, 160, 410],
                     '1991price': [7, 9, 8, 15]})
data2.set_index('county')

I want to create a MultiIndex so for for each row, you see the county, and then each county has an entry for a 'population' and a 'price' variable, and only one column for each for each year 1990 and 1991.

Here's what I'd like the dataframe to look like

I've tried various merging and MultiIndex guides but this is a bit beyond my ability. I'd greatly appreciate any help!


Solution

  • Since years are safely 4 digits, I would create a MultiIndex on the columns, then stack:

    data2.columns = pd.MultiIndex.from_arrays([data2.columns.str[0:4], data2.columns.str[4:]],
                                              names=[None, 'Type'])
    data2.stack(1)
    
                  1990  1991
    county Type             
    A      pop     100   110
           price    35     7
    B      pop     200   210
           price    20     9
    C      pop     150   160
           price    10     8
    D      pop     400   410
           price    60    15