Search code examples
pythonpandasgroup-bymeanpandas-groupby

Assign group averages to each row in python/pandas


I have a dataframe and I am looking to calculate the mean based on store and all stores. I created code to calculate the mean but I am looking for a way that is more efficient.

DF

Cashier#     Store#     Sales    Refunds
001          001        100      1
002          001        150      2
003          001        200      2
004          002        400      1
005          002        600      4

DF-Desired

Cashier#     Store#     Sales    Refunds     Sales_StoreAvg    Sales_All_Stores_Avg
001          001        100      1            150               290
002          001        150      2            150               290
003          001        200      2            150               290
004          002        400      1            500               290
005          002        600      4            500               290

My Attempt I created two additional dataframes then did a left join

df.groupby(['Store#']).sum().reset_index().groupby('Sales').mean() 

Solution

  • I think you need DataFrameGroupBy.transform for a new column filled with aggregate values computed by mean:

    df['Sales_StoreAvg'] = df.groupby('Store#')['Sales'].transform('mean')
    df['Sales_All_Stores_Avg'] = df['Sales'].mean()
    print (df)
       Cashier#  Store#  Sales  Refunds  Sales_StoreAvg  Sales_All_Stores_Avg
    0         1       1    100        1             150                 290.0
    1         2       1    150        2             150                 290.0
    2         3       1    200        2             150                 290.0
    3         4       2    400        1             500                 290.0
    4         5       2    600        4             500                 290.0