Search code examples
python-3.xpandasdataframepivotexport-to-excel

How to group by column in a dataframe and create pivot tables in a loop


I have the following table df .

ID  CATEG   LEVEL   COLS    VALUE   COMMENT
1    A       3      Apple    388    comment1
1    A       3      Orange   204    comment1
1    A       2      Orange   322    comment1
1    A       1      Orange   716    comment1
1    A       1      Apple    282    comment1
1    A       2      Apple    555    comment1
1    A              Berry    289    comment1
2    A              Car      316    comment1
1    B              Berry    297    comment1
1    B       3      Apple    756    comment1
1    B       2      Apple    460    comment1
1    B       3      Orange   497    comment1
1    B       2      Orange   831    comment1
1    B       1      Orange   225    comment1
1    B       1      Apple    395    comment1
2    B              Car      486    comment1
1    C       2      Orange   320    comment1
1    C       1      Orange   208    comment1
1    C       1      Apple    464    comment1
1    C       2      Apple    613    comment1
1    C       3      Apple    369    comment1
1    C              Berry    474    comment1
2    C              Car      888    comment1
1    C       3      Orange   345    comment1
2    B              Car      664    comment2

I want to create this view in dataframe and write in excel for each group of ID.Example for ID 1. In my sample there is only one comment so sheet name be like ID_COMMENT like 1_comment1:-

  Berry     Apple     Orange        
         1   2   3  1   2   3
A   289 388 555 282 204 322 716
B   297 756 460 395 497 831 225
C   474 369 613 464 345 320 208

If LEVEL is None/na I should be able to create/ split the df based on COLS and comments alone with name "ID_NULL_COMMENT" as sheet name like:- 2_NULL_comment1 sheet :-

   CATEG    Car
     A      316
     B      486
     C      888

2_NULL_comment2 sheet :-

CATEG   Car
 B      664

what I tried :

from pandas import ExcelWriter
writer = ExcelWriter('Values.xlsx')
distinct_id_df= np.unique(df[['ID']], axis=0)   
for ID in  distinct_id_df.iloc[:,0] :
    sample_df = pd.DataFrame()
    for df in sample_df:
        for i in(distinct_id_df):
            distinct_id_df = df.groupby['ID'].pivot_table('VALUE', ['LEVEL','CATEEG'],'COLS')
        sample_df = sample_df.append(df)
        print(sample_df.shape, '===>', datetime.now())
    sample_df.to_excel(writer,'{}''{}'.format(id).format(comments),index= False)

writer.save()

This is not correct clearly, Im unable to do the pivot correctly and also stuck on how to loop correctly to place in different sheet.


Solution

  • Use:

    df = pd.DataFrame({'ID': [1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2], 'CATEG': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'B'], 'LEVEL': [3.0, 3.0, 2.0, 1.0, 1.0, 2.0,  np.nan,  np.nan,  np.nan, 3.0, 2.0, 3.0, 2.0, 1.0, 1.0,  np.nan, 2.0, 1.0, 1.0, 2.0, 3.0,  np.nan,  np.nan, 3.0,  np.nan], 'COLS': ['Apple', 'Orange', 'Orange', 'Orange', 'Apple', 'Apple', 'Berry', 'Car', 'Berry', 'Apple', 'Apple', 'Orange', 'Orange', 'Orange', 'Apple', 'Car', 'Orange', 'Orange', 'Apple', 'Apple', 'Apple', 'Berry', 'Car', 'Orange', 'Car'], 'VALUE': [388, 204, 322, 716, 282, 555, 289, 316, 297, 756, 460, 497, 831, 225, 395, 486, 320, 208, 464, 613, 369, 474, 888, 345, 664], 'COMMENT': ['comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment1', 'comment2']})
    

    #check misisng values
    mask = df['LEVEL'].isna()
    
    #split DataFrames for different processing
    df1 = df[~mask]
    df2 = df[mask]
    
    #pivoting with differnet columns parameters
    df1 = df1.pivot_table(index=['ID','COMMENT','CATEG'], 
                          columns=['COLS','LEVEL'],
                          values='VALUE')
    # print (df1)
    
    df2 = df2.pivot_table(index=['ID','COMMENT','CATEG'], columns='COLS',values='VALUE')
    # print (df1)
    
    from pandas import ExcelWriter
    with pd.ExcelWriter('Values.xlsx') as writer: 
        
        #groupby by first 2 levels ID, COMMENT
        for (ids,comments), sample_df in df1.groupby(['ID','COMMENT']):
            #removed first 2 levels, also removed only NaNs columns
            df = sample_df.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
            #new sheetnames by f-strings
            name = f'{ids}_{comments}'
            #write to file
            df.to_excel(writer,sheet_name=name)
            
        for (ids,comments), sample_df in df2.groupby(['ID','COMMENT']):
            df = sample_df.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
            name = f'{ids}_NULL_{comments}'
            df.to_excel(writer,sheet_name=name)
    

    Another solution without repeating code:

    mask = df['LEVEL'].isna()
    
    dfs = {'no_null': df[~mask], 'null': df[mask]}
    
    from pandas import ExcelWriter
    with pd.ExcelWriter('Values.xlsx') as writer: 
        
        for k, v in dfs.items():
            if k == 'no_null':
                add = ''
                cols = ['COLS','LEVEL']
            else:
                 add = 'NULL_'
                 cols = 'COLS'
            
            df = v.pivot_table(index=['ID','COMMENT','CATEG'], columns=cols, values='VALUE')
              
            for (ids,comments), sample_df in df.groupby(['ID','COMMENT']):
                df = sample_df.reset_index(level=[1], drop=True).dropna(how='all', axis=1)
                name = f'{ids}_{add}{comments}'
                df.to_excel(writer,sheet_name=name)