Let's say I have a dataframe like this:
data = {'City': ['Rochester', 'Anaheim', 'Toledo', 'Rochester', 'Anaheim', 'Anaheim', 'Toledo', 'Rochester', 'Rochester', 'Rochester', 'Toledo', 'Toledo', 'Toledo', 'Anaheim'],
'PersonID': [4930, 7343, 4368, 6909, 4574, 4086, 5024, 3642, 9997, 4745, 1207, 6081, 7832, 6309],
'MoneySpent': [100, 1710, 20, 910, 2040, 1100, 490, 70, 1940, 100, 1240, 80, 1420, 2090],
'StayDuration': ['< 2 days', '2-7 days', '2-7 days', '7-30 days', '7-30 days', '< 2 days', '2-7 days', '7-30 days', '7-30 days', '2-7 days', '7-30 days', '< 2 days', '< 2 days', '7-30 days']
}
df = pd.DataFrame(data)
City PersonID MoneySpent StayDuration
0 Rochester 4930 100 < 2 days
1 Anaheim 7343 1710 2-7 days
2 Toledo 4368 20 2-7 days
3 Rochester 6909 910 7-30 days
4 Anaheim 4574 2040 7-30 days
5 Anaheim 4086 1100 < 2 days
6 Toledo 5024 490 2-7 days
7 Rochester 3642 70 7-30 days
8 Rochester 9997 1940 7-30 days
9 Rochester 4745 100 2-7 days
10 Toledo 1207 1240 7-30 days
11 Toledo 6081 80 < 2 days
12 Toledo 7832 1420 < 2 days
13 Anaheim 6309 2090 7-30 days
I am then building a pivot table to show the number of people and their total expenses by stay duration for each city:
pv = pd.pivot_table(df,
index='City',
columns='StayDuration',
values=['PersonID', 'MoneySpent'],
aggfunc={'PersonID': 'count', 'MoneySpent': 'sum'}
)
What I'm seeing is metrics at the first level (headcount or expenses), then categories inside them:
MoneySpent PersonID
StayDuration 2-7 days 7-30 days < 2 days 2-7 days 7-30 days < 2 days
City
Anaheim 1710 4130 1100 1 2 1
Rochester 100 2920 100 1 3 1
Toledo 510 1240 1500 2 1 2
What I want is to have categories first, and metrics inside them, like this:
2-7 days 7-30 days < 2 days
PersonID MoneySpent PersonID MoneySpent PersonID MoneySpent
Anaheim 1 1710 2 4130 1 1100
Rochester 1 100 3 2920 1 100
Toledo 2 510 1 1240 2 1500
Which is, incidentally, the default view for an Excel pivot table.
It's taking me surprisingly long to figure out how to make Python produce the same result. Is it possible to change the order the columns are grouped in?
As far as I know, pandas pivot will always sort the columns in that fashion. You will require some manipulations to get the output you need:
pv.swaplevel(0,1,axis=1).sort_index(axis=1).reindex(['PersonID', 'MoneySpent'], level=1, axis=1)
Output:
StayDuration 2-7 days 7-30 days < 2 days
PersonID MoneySpent PersonID MoneySpent PersonID MoneySpent
City
Anaheim 1 1710 2 4130 1 1100
Rochester 1 100 3 2920 1 100
Toledo 2 510 1 1240 2 1500