Search code examples
pandasaverage

How to average column values every n rows in pandas


Could you let know how to average column values with same value in the first column as below? I'd like to calculate calculate every n rows of all columns. for example, the average of all the values from a1~a6 with site value '1' equals to 3. (my data has numerous columns and rows)

raw = {'site':['1','1','1','1','2','2','2','2'],
       'a1':['1','2','3','4','5','6','7','8'],
       'a2':['2','3','4','5','6','7','8','9'],
       'a3':['2','3','4','5','6','7','8','9'],
       'a4':['2','3','4','5','6','7','8','9'],
       'a5':['2','3','4','5','6','7','8','9'],
       'a6':['2','3','4','5','6','7','8','9']
      }
df_tmp = pd.DataFrame(raw,columns = ['site','a1','a2','a3','a4','a5','a6'])
print(df_tmp)

Thank you


Solution

  • IIUC, DataFrame.melt + mean for each site with GroupBy.mean

    # df_tmp = df_tmp.astype(int) # get correct result
    df_tmp.melt('site').groupby('site')['value'].mean()
    

    Or:

    # df_tmp = df_tmp.astype(int) # get correct result
    df_tmp.set_index('site').stack().groupby(level=0).mean()
    #df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated 
    

    Output

    site
    1    3.333333
    2    7.333333
    Name: value, dtype: float64