Search code examples
rmeandata-management

How to find the average, max, and min based on multiple sets of criteria


I have found a few options from other posts but I am having trouble reproducing the code for my specific needs.

I have climate data that I would like to find the average based on season from 1910-2015 for specific locations.

Here is an example of what I need: the average PPT for CENTROID_ID c1763_1 in 1911 for Winter (Month 12 of the preceding year [1910], month 1 and 2 of year in question [1911]), Spring (months 3,4,5 in 1911), Summer (months 6,7,8 in 1911) and Fall (months 9,10,11 in 1911). This would then need to be done for all individual centroid IDs for every year. I have over 400 unique CENTROID_IDs covering years 1910-2015.

I envision the new dataframe to have the columns CENTROID_ID, YEAR, WINT_PPT, SPR_PPT, SUM_PPT, FALL_PPT.

 CENTROID_ID    YEAR MONTH  PPT
1       c1763_1 1910     1  52.639
2       c1763_1 1910     2  20.870
3       c1763_1 1910     3  21.706
4       c1763_1 1910     4   9.347
5       c1763_1 1910     5   1.201
6       c1763_1 1910     6  11.267
7       c1763_1 1910     7  41.870
8       c1763_1 1910     8  61.260
9       c1763_1 1910     9  27.815
10      c1763_1 1910    10  67.377
11      c1763_1 1910    11  24.719
12      c1763_1 1910    12  30.212
13      c1763_1 1911     1  88.728
14      c1763_1 1911     2  50.035
15      c1763_1 1911     3  37.720
16      c1763_1 1911     4  12.831
17      c1763_1 1911     5   0.739
18      c1763_1 1911     6  18.198
19      c1763_1 1911     7  74.731
20      c1763_1 1911     8  40.873
21      c1763_1 1911     9  86.340
22      c1763_1 1911    10  36.423
23      c1763_1 1911    11  12.491
24      c1763_1 1911    12  19.428
25      c1763_1 1912     1  11.010
26      c1763_1 1912     2  16.339
27      c1763_1 1912     3  72.017
28      c1763_1 1912     4  25.887
29      c1763_1 1912     5   5.314
30      c1763_1 1912     6   8.595
31      c1763_1 1912     7  47.781
32      c1763_1 1912     8  51.188
33      c1763_1 1912     9  10.931
34      c1763_1 1912    10 119.725
35      c1763_1 1912    11  10.420
36      c1763_1 1912    12   8.777
37      c1763_1 1913     1  27.771
38      c1763_1 1913     2  62.622
39      c1763_1 1913     3  17.533
40      c1763_1 1913     4   8.008
41      c1763_1 1913     5   1.423
42      c1763_1 1913     6   3.773
43      c1763_1 1913     7  42.982
44      c1763_1 1913     8  40.541
45      c1763_1 1913     9  58.495
46      c1763_1 1913    10  22.729
47      c1763_1 1913    11  48.130
48      c1763_1 1913    12  32.049
49      c1763_1 1914     1 104.197
50      c1763_1 1914     2  31.707

Solution

  • Simply assign a SEASON column then aggregate:

    df <- within(df, {
              SEASON <- ifelse(MONTH %in% c(12, 1, 2), 'Winter',
                               ifelse(MONTH %in% c(3, 4, 5), 'Spring',
                                      ifelse (MONTH %in% c(6, 7, 8), 'Summer', 
                                              ifelse(MONTH %in% c(9, 10, 11), 'Fall', NA)
                                      )
                              )
                         )
    
              YEAR <- ifelse(MONTH == 12, YEAR + 1, YEAR)
          })
    
    agg_df <- aggregate(PPT ~ CENTROID_ID + SEASON, df, FUN=mean)
    agg_df
    #   CENTROID_ID SEASON      PPT
    # 1     c1763_1   Fall 43.79958
    # 2     c1763_1 Spring 17.81050
    # 3     c1763_1 Summer 36.92158
    # 4     c1763_1 Winter 39.74171
    

    Should you need multiple aggregations of PPT:

    agg_raw <- aggregate(PPT ~ CENTROID_ID + SEASON, df, 
                         FUN=function(x) c(min=min(x), mean=mean(x), max=max(x)))
    
    agg_df <- do.call(data.frame, agg_raw)
    agg_df
    
    #   CENTROID_ID SEASON PPT.min PPT.mean PPT.max
    # 1     c1763_1   Fall  10.420 43.79958 119.725
    # 2     c1763_1 Spring   0.739 17.81050  72.017
    # 3     c1763_1 Summer   3.773 36.92158  74.731
    # 4     c1763_1 Winter   8.777 39.74171 104.197