Search code examples
pythonpandaskaggle

Find average based upon two conditions; create column from these averages


I have a df with weather reporting data. It has over 2 million rows and the following columns.

ID  MONTH  TEMP   
1   1      0      
1   1      10     

2   1      50     
2   1      60     

3   1      80    
3   1      90     

1   2      0      
1   2      10     

2   2      50     
2   2      60     

3   2      80     
3   2      90     

I am looking to create an column for the average monthly temperature. I need a faster way than for-loops. The values for average monthly temperature are from the TEMP column. I would like them to be specific to each ID for each MONTH.

ID  MONTH  TEMP   AVE MONTHLY TEMP
1   1      0      5
1   1      10     5

2   1      50     55
2   1      60     55

3   1      80     85
3   1      90     85

1   2      0      5
1   2      10     5

2   2      50     55
2   2      60     55

3   2      80     85
3   2      90     85

Solution

  • Use groupby.transform:

    df['AVE MONTHLY TEMP']=df.groupby(['ID','MONTH'])['TEMP'].transform('mean')
    print(df)
    

    Output

        ID  MONTH  TEMP  AVE MONTHLY TEMP
    0    1      1     0                 5
    1    1      1    10                 5
    2    2      1    50                55
    3    2      1    60                55
    4    3      1    80                85
    5    3      1    90                85
    6    1      2     0                 5
    7    1      2    10                 5
    8    2      2    50                55
    9    2      2    60                55
    10   3      2    80                85
    11   3      2    90                85