Search code examples
pythonpandaspivot-table

Creating a pivot table with multiple columns


I'm trying to create a pivot table with multiple columns; I'm unsure how to explain this better. But the following is the desired output, dataframe setup, and code I have tried so far.

Desired Output:enter image description here

Dataframe Setup:

data = {
 'WholesalerID': {0: 121, 1: 121, 2: 42, 3: 42, 4: 54, 5: 43, 6: 432, 7: 4245, 8: 4245, 9: 4245, 10: 457},
 'Brand': {0: 'Vans', 1: 'Nike', 2: 'Nike', 3: 'Vans',4: 'Vans', 5: 'Nike', 6: 'Puma', 7: 'Vans', 8: 'Nike', 9: 'Puma', 10: 'Converse'},
 'Shop 1': {0: 'Yes', 1: 'No', 2: 'Yes', 3: 'Maybe', 4: 'Yes', 5: 'No', 6: 'Yes', 7: 'Yes', 8: 'Maybe', 9: 'Maybe', 10: 'No'},
 'Shop 2': {0: 'No', 1: 'Yes', 2: 'Maybe', 3: 'Maybe', 4: 'Yes', 5: 'No', 6: 'No', 7: 'No', 8: 'Maybe', 9: 'Yes', 10: 'Yes'}
}

df = pd.DataFrame.from_dict(data)

Pivoting Attempt:

df = df.assign(count = 1)
pivoted_df = pd.pivot_table(df,
                            index = ['Brand'],
                            columns = ['Shop 1', 'Shop 2'],
                            values = ['count'],
                            aggfunc = {'count': 'count'},
                            fill_value = 0,
                            margins = True,
                            margins_name = 'Total'
)

Solution

  • Use DataFrame.melt for unpivot, then convert column value to Categorical for correct sorting final MultiIndex, reshape by crosstab, add subtotal per first level of columns and append to original with add level Total:

    df = df.melt(id_vars='Brand', value_vars=['Shop 1', 'Shop 2'])
    df['value'] = pd.Categorical(df['value'], ordered=True, categories=df['value'].unique())
                     .add_categories(['Total'])
    
    pivoted_df = pd.crosstab(df['Brand'], [df['variable'], df['value']])
    
    totals = (pd.concat({'Total': pivoted_df.groupby(level=0, axis=1).sum()}, axis=1)
                .swaplevel(axis=1))
    
    pivoted_df = pd.concat([pivoted_df, totals],axis=1).sort_index(axis=1)
    print (pivoted_df)
    variable Shop 1                Shop 2               
    value       Yes No Maybe Total    Yes No Maybe Total
    Brand                                               
    Converse      0  1     0     1      1  0     0     1
    Nike          1  2     1     4      1  1     2     4
    Puma          1  0     1     2      1  1     0     2
    Vans          3  0     1     4      1  2     1     4