Search code examples
pythonpandaspandas-groupbymulti-index

pandas transform 1st mutliindex to rowindex and 2nd multiindex to columnindex


So I have the following code which transforms groups a given dataframe by to columns and calculates the size of each group. This creates a DataFrame with one column and a Multiindex of two values. Afterwards I transfrom the result to put the 2nd Index as columnindex and the 1st index as rowindex.

import pandas as pd
from random import choice

products=['car', 'pen', 'kitchen', 'bed']
df=pd.DataFrame([{'product': choice(products), 'Changing': choice([True, False])} for _ in range(10000)])

df_grp=df.groupby(['product', 'Changing']).size().to_frame()
print(df_grp)
print('-'*50)

d=[]
for i in df_grp.reset_index()['product'].unique():
    d.append({'product':i, 'not changing': df_grp.loc[(i,False)][0], 'changing': df_grp.loc[(i,True)][0]})
print(pd.DataFrame(d).set_index('product'))

this results in the following output

                     0
product Changing      
bed     False     1253
        True      1269
car     False     1244
        True      1275
kitchen False     1236
        True      1271
pen     False     1206
        True      1246
--------------------------------------------------
         not changing  changing
product                        
bed              1253      1269
car              1244      1275
kitchen          1236      1271
pen              1206      1246

this is exactly what I want to achieve. However, my approch seems a bit complicated. Is there a more elegant way to do this? Maybe with da built-in pandas function or a lambda function? Also my approach does not generalize well since I always assume that the second index is a bool.


Solution

  • try via unstack():

    df_grp=df.groupby(['product', 'Changing']).size().unstack()
    

    Finally make use of columns attribute:

    df_grp.columns=[f'{"not "*col}{df_grp.columns.name}' for col in df_grp][::-1]
    #as suggested by @Cyttorak #rename column without hard coding
    #OR
    df_grp.columns=['not changing','changing'] #you have to manually write the names
    

    output of df_grp:

              not changing  changing
    product         
    bed         1210        1226
    car         1220        1272
    kitchen     1238        1277
    pen         1267        1290