Search code examples
pythonpandaspivot-tablemulti-index

pivot_table: pandas - how to transform pivot output for data frame and access columns?


import numpy as np
import pandas as pd

data = {'experiment_name': ['exp1', 'exp1', 'exp1', 'exp1', 'exp1', 'exp1'], 
'variant': ['A', 'B', 'A','B','A','B'],'sessions_with_orders':[1,2,6,0,23,12],
'total_sessions':[10,23,56,22,89,12]}  

Create DataFrame

df = pd.DataFrame(data)    

final_pivot=(df.pivot_table(index='variant',columns='experiment_name',values=['total_sessions','sessions_with_orders'],aggfunc=np.sum)
.assign(ratio=lambda d: d['sessions_with_orders']/d['total_sessions']) )

final_pivot_reset=final_pivot.reset_index() ##this makes an accessible data frame using .iloc

type(final_pivot_reset)
pandas.core.frame.DataFrame

I don't think final_pivot.reset_index() is doing correct thing?

Further, I have challenge in accessing column/names from the reset output. When I look at the column names they are hierarchal (default output from pivot_table).

final_pivot_reset.columns

MultiIndex([(             'variant',     ''),
            ('sessions_with_orders', 'exp1'),
            (      'total_sessions', 'exp1'),
            (               'ratio',     '')],
           names=[None, 'experiment_name'])

For example, I would like to access final_pivot_reset.ratio how do I achieve it?

I had created a post on how to create custom column on the fly for pivot_table pandas - pivot || create custom column for ratio/percentage

Thank you.


Solution

  • You can flatten MultiIndex after pivot_table with separator _, so change columns names for ratio. Last if need convert index to column variant add DataFrame.reset_index:

    final_pivot=df.pivot_table(index='variant',
                               columns='experiment_name',
                               values=['total_sessions','sessions_with_orders'],
                               aggfunc=np.sum)
    final_pivot.columns = [f'{a}_{b}' for a, b in final_pivot.columns] 
    final_pivot = (final_pivot.assign(ratio=lambda d: d['sessions_with_orders_exp1']/d['total_sessions_exp1'])
                              .reset_index())
    print (final_pivot)
      variant  sessions_with_orders_exp1  total_sessions_exp1     ratio
    0       A                         30                  155  0.193548
    1       B                         14                   57  0.245614