Search code examples
pythonexcelpandasdataframegroup-by

Python: How do I get multiple results in the same excel sheet?


I have a data set consisting of 10 columns and I am currently using the groupby() function to get averages of the first 6 columns and then using groupby() to get min and max values of the other 4 columns. The issue is when I try to put the results into a new excel sheet, I keep getting errors that 'list' object has no attribute 'to_excel'

avg = df.groupby('column1')[['column2' , 'column3', 'column4' ,'column5',  'column6',  'column7' ]].mean()

c8= (df.assign(c8=df['column8'].abs())
       .groupby(['column1'])['column8'].agg([('min' , 'min'), ('max', 'max')])
       .add_prefix('c8'))

c9= (df.assign(c9=df['column9'].abs())
       .groupby(['column1'])['column9'].agg([('min' , 'min'), ('max', 'max')])
       .add_prefix('c9'))

c10= (df.assign(c10=df['column'].abs())
       .groupby(['column1'])['column10'].agg([('min' , 'min'), ('max', 'max')])
       .add_prefix('c10'))


df = [avg , c8, c9, c10]

#print(df)

wb = pd.ExcelWriter('C:filepath.xlsx' , engine='xlsxwriter')
df.to_excel(wb,sheet_name='Results')
wb.save()
         

Whenever I run any of the functions by themselves, they all work fine, the issue is when I put them together.

What I am expecting is an excel sheet filled with the corresponding data presented like this:

column1 column2 column3 column4 column5 column6 column7 c8min c8max c9min c9max c10min c10max 


Solution

  • I would do it this way with GroupBy.agg in a single shot :

    fmin = lambda x: abs(x).min()
    fmax = lambda x: abs(x).max()
    
    out = (
            df.groupby("column1", as_index=False)
              .agg(column2=("column2", "mean"), column3=("column3", "mean"),
                   column4=("column4", "mean"), column5=("column5", "mean"),
                   column6=("column6", "mean"), column7=("column7", "mean"), # avg
                   c8min=("column8", fmin), c8max=("column8", fmax), # c8
                   c9min=("column9", fmin), c9max=("column9", fmax), # c9
                   c10min=("column10", fmin), c10max=("column10", fmax)) # c10
    )
    
    
    out.to_excel("C:/filepath.xlsx", sheet_name="Results", index=False)
    

    Or following your approach, you can use concat to avoid the AttributeError :

    list_dfs = [avg , c8, c9, c10]
    
    (pd.concat(list_dfs, axis=1).reset_index()
        .to_excel("C:/filepath.xlsx", sheet_name="Results", index=False))
    

    By the way, why you're assigning the new columns cx and not using them ? I suppose you want something like :

    #Example with c8
    c8= (df.assign(c8=df["column8"].abs())
           .groupby(["column1"])["c8"].agg([("min" , "min"), ("max", "max")])
           .add_prefix("c8"))