python pandas group-by pivot-table percentage

Pandas - Percentage from total in pivot table

I used to tackle this kind of thing reasonably quickly within DAX, but being new to pandas, I have been stuck for a while on this:

I am trying to output a pivot table showing the % of visa sales per month (columns) and per city (rows).

Here is the output I am looking for:

              Jan        Feb                     
London  50.055991  56.435644
Paris   15.119760  67.170191

I've tried various pivot tables and group-by functions, which got me so close and yet so far from what I need. I'm just used to creating "measures" in Excel that I can add to the pivot table like a regular dimension or fact.

Reproducible input:

data = {'Month': {0: 'Jan',
  1: 'Jan',
  2: 'Jan',
  3: 'Jan',
  4: 'Feb',
  5: 'Feb',
  6: 'Feb',
  7: 'Feb',
  8: 'Feb'},
 'City': {0: 'Paris',
  1: 'Paris',
  2: 'London',
  3: 'London',
  4: 'Paris',
  5: 'Paris',
  6: 'London',
  7: 'London',
  8: 'Paris'},
 'Card': {0: 'Visa',
  1: 'MasterCard',
  2: 'Visa',
  3: 'MasterCard',
  4: 'Visa',
  5: 'MasterCard',
  6: 'Visa',
  7: 'MasterCard',
  8: 'Visa'},
 ' Amount ': {0: ' $101 ',
  1: ' $567 ',
  2: ' $447 ',
  3: ' $446 ',
  4: ' $926 ',
  5: ' $652 ',
  6: ' $114 ',
  7: ' $88 ',
  8: ' $408 '}}

df = pd.DataFrame.from_dict(data)
df

Solution

Using a pivot_table, pipe to compute the ratio, and unstack to reshape:

df['Amount'] = pd.to_numeric(df['Amount'].str.strip(' $'))

out = (df
   .pivot_table(index=['Month', 'City'], columns='Card',
                values='Amount', aggfunc='sum')
   .pipe(lambda x: x['Visa']/x.sum(axis=1)*100)
   .unstack('Month')
)

Output:

Month         Feb        Jan
City                        
London  56.435644  50.055991
Paris   67.170191  15.119760

To sort the months:

from calendar import month_abbr

months = {m:i for i, m in enumerate(month_abbr)}

df['Amount'] = pd.to_numeric(df['Amount'].str.strip(' $'))

out = (df
   .pivot_table(index=['Month', 'City'], columns='Card',
                values='Amount', aggfunc='sum')
   .pipe(lambda x: x['Visa']/x.sum(axis=1)*100)
   .unstack('Month').sort_index(axis=1, key=lambda x: x.map(months))
)

Output:

Month         Jan        Feb
City                        
London  50.055991  56.435644
Paris   15.119760  67.170191