Search code examples
pythonpandaspivot-tablemulti-index

How to change the grouping of a column multi-index in a pandas pivot table?


Let's say I have a dataframe like this:

data = {'City': ['Rochester', 'Anaheim', 'Toledo', 'Rochester', 'Anaheim', 'Anaheim', 'Toledo', 'Rochester', 'Rochester', 'Rochester', 'Toledo', 'Toledo', 'Toledo', 'Anaheim'],
        'PersonID': [4930, 7343, 4368, 6909, 4574, 4086, 5024, 3642, 9997, 4745, 1207, 6081, 7832, 6309],
        'MoneySpent': [100, 1710, 20, 910, 2040, 1100, 490, 70, 1940, 100, 1240, 80, 1420, 2090],
        'StayDuration': ['< 2 days', '2-7 days', '2-7 days', '7-30 days', '7-30 days', '< 2 days', '2-7 days', '7-30 days', '7-30 days', '2-7 days', '7-30 days', '< 2 days', '< 2 days', '7-30 days']
       }

df = pd.DataFrame(data)
    City        PersonID    MoneySpent  StayDuration
0   Rochester   4930        100         < 2 days
1   Anaheim     7343        1710        2-7 days
2   Toledo      4368        20          2-7 days
3   Rochester   6909        910         7-30 days
4   Anaheim     4574        2040        7-30 days
5   Anaheim     4086        1100        < 2 days
6   Toledo      5024        490         2-7 days
7   Rochester   3642        70          7-30 days
8   Rochester   9997        1940        7-30 days
9   Rochester   4745        100         2-7 days
10  Toledo      1207        1240        7-30 days
11  Toledo      6081        80          < 2 days
12  Toledo      7832        1420        < 2 days
13  Anaheim     6309        2090        7-30 days

I am then building a pivot table to show the number of people and their total expenses by stay duration for each city:

pv = pd.pivot_table(df,
                    index='City',
                    columns='StayDuration',
                    values=['PersonID', 'MoneySpent'],
                    aggfunc={'PersonID': 'count', 'MoneySpent': 'sum'}
                   )

What I'm seeing is metrics at the first level (headcount or expenses), then categories inside them:

                                      MoneySpent                            PersonID
StayDuration    2-7 days    7-30 days   < 2 days    2-7 days    7-30 days   < 2 days
City                        
Anaheim         1710        4130        1100        1           2           1
Rochester       100         2920        100         1           3           1
Toledo          510         1240        1500        2           1           2

What I want is to have categories first, and metrics inside them, like this:

            2-7 days                7-30 days               < 2 days    
            PersonID   MoneySpent   PersonID   MoneySpent   PersonID   MoneySpent  
Anaheim     1          1710         2          4130         1          1100
Rochester   1          100          3          2920         1          100
Toledo      2          510          1          1240         2          1500

Which is, incidentally, the default view for an Excel pivot table.

It's taking me surprisingly long to figure out how to make Python produce the same result. Is it possible to change the order the columns are grouped in?


Solution

  • As far as I know, pandas pivot will always sort the columns in that fashion. You will require some manipulations to get the output you need:

    pv.swaplevel(0,1,axis=1).sort_index(axis=1).reindex(['PersonID', 'MoneySpent'], level=1, axis=1)
    

    Output:

    StayDuration 2-7 days            7-30 days            < 2 days           
                 PersonID MoneySpent  PersonID MoneySpent PersonID MoneySpent
    City                                                                     
    Anaheim             1       1710         2       4130        1       1100
    Rochester           1        100         3       2920        1        100
    Toledo              2        510         1       1240        2       1500