Search code examples
rdplyraggregatesummarize

Aggregate dataframe in R to get mean of groups based on "detect flag"


I have the data frame:

    structure(list(Loc_name = c("U3R", "U3R", "U3R", "U3R", "U3R", 
"U3R", "HWY301", "HWY301", "HWY301", "HWY301", "HWY301", "HWY301", 
"U3R", "U3R", "U3R", "U3R", "U3R", "U3R", "HWY301", "HWY301", 
"HWY301", "HWY301", "HWY301", "HWY301"), `fish type` = c("bass", 
"bass", "bass", "catfish", "catfish", "catfish", "flathead", 
"flathead", "flathead", "bass", "bass", "bass", "bass", "bass", 
"bass", "catfish", "catfish", "catfish", "flathead", "flathead", 
"flathead", "bass", "bass", "bass"), Report_result_value = c(1.1, 
1.2, 1.3, 2.1, 2.2, 2.3, 3.1, 3.2, 3.3, 4.1, 4.2, 4.3, 1.1, 1.2, 
1.3, 2.1, 2.2, 2.3, 3.1, 3.2, 3.3, 4.1, 4.2, 4.3), Chemical_Name = c("Cs-137", 
"Cs-137", "Cs-137", "Cs-137", "Cs-137", "Cs-137", "Cs-137", "Cs-137", 
"Cs-137", "Cs-137", "Cs-137", "Cs-137", "SR-90", "SR-90", "SR-90", 
"SR-90", "SR-90", "SR-90", "SR-90", "SR-90", "SR-90", "SR-90", 
"SR-90", "SR-90"), detect_flag = c("Y", "Y", "Y", "N", "Y", "N", 
"N", "N", "N", "Y", "N", "N", "Y", "Y", "Y", "N", "Y", "N", "N", 
"N", "N", "Y", "N", "N")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -24L))

I would like to get the average report result value of a particular chemical, in a particular fish, at a particular location. The rules for taking the average are: if the detect flag is "N" for a particular fish type, at a particular location, for a particular chemical name, then the mean is zero. Otherwise, the mean is the straight average, regardless of detect flag. For example, the average of SR-90 for fish "Flathead" at HWY301 would be 0. Conversely, the average of Cs-137 for fish "bass" at HWY301 would be 4.2.

I've heard Loops are slow in R, so I'm trying to avoid using a loop as much as possible. Does anyone have an approach in R for calculating the mean considering the rules for averaging above? Much appreciated.

I have tried aggregate and dplyr's summarize function but I have not been able to accommodate the averaging rule with those approaches. I'm fairly new to R.


Solution

  • Based on the comments "there has to be at least one detect_flag = 'Y' for the average to be calculated normally, otherwise it's 0":

    fish_data %>% 
      group_by(Loc_name, `fish type`, Chemical_Name) %>% 
      summarise(meanVal = ifelse("Y" %in% detect_flag, 
                                 mean(Report_result_value), 
                                 0)) %>% 
      ungroup()
    

    Result:

    # A tibble: 8 × 4
      Loc_name `fish type` Chemical_Name meanVal
      <chr>    <chr>       <chr>           <dbl>
    1 HWY301   bass        Cs-137            4.2
    2 HWY301   bass        SR-90             4.2
    3 HWY301   flathead    Cs-137            0  
    4 HWY301   flathead    SR-90             0  
    5 U3R      bass        Cs-137            1.2
    6 U3R      bass        SR-90             1.2
    7 U3R      catfish     Cs-137            2.2
    8 U3R      catfish     SR-90             2.2