I used to tackle this kind of thing reasonably quickly within DAX, but being new to pandas, I have been stuck for a while on this:
I am trying to output a pivot table showing the % of visa sales per month (columns) and per city (rows).
Here is the output I am looking for:
Jan Feb
London 50.055991 56.435644
Paris 15.119760 67.170191
I've tried various pivot tables and group-by functions, which got me so close and yet so far from what I need. I'm just used to creating "measures" in Excel that I can add to the pivot table like a regular dimension or fact.
Reproducible input:
data = {'Month': {0: 'Jan',
1: 'Jan',
2: 'Jan',
3: 'Jan',
4: 'Feb',
5: 'Feb',
6: 'Feb',
7: 'Feb',
8: 'Feb'},
'City': {0: 'Paris',
1: 'Paris',
2: 'London',
3: 'London',
4: 'Paris',
5: 'Paris',
6: 'London',
7: 'London',
8: 'Paris'},
'Card': {0: 'Visa',
1: 'MasterCard',
2: 'Visa',
3: 'MasterCard',
4: 'Visa',
5: 'MasterCard',
6: 'Visa',
7: 'MasterCard',
8: 'Visa'},
' Amount ': {0: ' $101 ',
1: ' $567 ',
2: ' $447 ',
3: ' $446 ',
4: ' $926 ',
5: ' $652 ',
6: ' $114 ',
7: ' $88 ',
8: ' $408 '}}
df = pd.DataFrame.from_dict(data)
df
Using a pivot_table
, pipe
to compute the ratio, and unstack
to reshape:
df['Amount'] = pd.to_numeric(df['Amount'].str.strip(' $'))
out = (df
.pivot_table(index=['Month', 'City'], columns='Card',
values='Amount', aggfunc='sum')
.pipe(lambda x: x['Visa']/x.sum(axis=1)*100)
.unstack('Month')
)
Output:
Month Feb Jan
City
London 56.435644 50.055991
Paris 67.170191 15.119760
To sort the months:
from calendar import month_abbr
months = {m:i for i, m in enumerate(month_abbr)}
df['Amount'] = pd.to_numeric(df['Amount'].str.strip(' $'))
out = (df
.pivot_table(index=['Month', 'City'], columns='Card',
values='Amount', aggfunc='sum')
.pipe(lambda x: x['Visa']/x.sum(axis=1)*100)
.unstack('Month').sort_index(axis=1, key=lambda x: x.map(months))
)
Output:
Month Jan Feb
City
London 50.055991 56.435644
Paris 15.119760 67.170191