Search code examples
pythonpandasgroup-bypivot-tablepercentage

Pandas - Percentage from total in pivot table


I used to tackle this kind of thing reasonably quickly within DAX, but being new to pandas, I have been stuck for a while on this:

I am trying to output a pivot table showing the % of visa sales per month (columns) and per city (rows).

Here is the output I am looking for:

              Jan        Feb                     
London  50.055991  56.435644
Paris   15.119760  67.170191

I've tried various pivot tables and group-by functions, which got me so close and yet so far from what I need. I'm just used to creating "measures" in Excel that I can add to the pivot table like a regular dimension or fact.

Reproducible input:

data = {'Month': {0: 'Jan',
  1: 'Jan',
  2: 'Jan',
  3: 'Jan',
  4: 'Feb',
  5: 'Feb',
  6: 'Feb',
  7: 'Feb',
  8: 'Feb'},
 'City': {0: 'Paris',
  1: 'Paris',
  2: 'London',
  3: 'London',
  4: 'Paris',
  5: 'Paris',
  6: 'London',
  7: 'London',
  8: 'Paris'},
 'Card': {0: 'Visa',
  1: 'MasterCard',
  2: 'Visa',
  3: 'MasterCard',
  4: 'Visa',
  5: 'MasterCard',
  6: 'Visa',
  7: 'MasterCard',
  8: 'Visa'},
 ' Amount ': {0: ' $101 ',
  1: ' $567 ',
  2: ' $447 ',
  3: ' $446 ',
  4: ' $926 ',
  5: ' $652 ',
  6: ' $114 ',
  7: ' $88 ',
  8: ' $408 '}}

df = pd.DataFrame.from_dict(data)
df

Solution

  • Using a pivot_table, pipe to compute the ratio, and unstack to reshape:

    df['Amount'] = pd.to_numeric(df['Amount'].str.strip(' $'))
    
    out = (df
       .pivot_table(index=['Month', 'City'], columns='Card',
                    values='Amount', aggfunc='sum')
       .pipe(lambda x: x['Visa']/x.sum(axis=1)*100)
       .unstack('Month')
    )
    

    Output:

    Month         Feb        Jan
    City                        
    London  56.435644  50.055991
    Paris   67.170191  15.119760
    

    To sort the months:

    from calendar import month_abbr
    
    months = {m:i for i, m in enumerate(month_abbr)}
    
    df['Amount'] = pd.to_numeric(df['Amount'].str.strip(' $'))
    
    out = (df
       .pivot_table(index=['Month', 'City'], columns='Card',
                    values='Amount', aggfunc='sum')
       .pipe(lambda x: x['Visa']/x.sum(axis=1)*100)
       .unstack('Month').sort_index(axis=1, key=lambda x: x.map(months))
    )
    

    Output:

    Month         Jan        Feb
    City                        
    London  50.055991  56.435644
    Paris   15.119760  67.170191