Search code examples
raverage

Average rate for 20 years, 10 years, and 5 years in R


I am trying to find the average rate of a certain virus between 2002-2021, 2002-2012, and 2002-2007 by another variable "jurisdiction". The code I have right now is:

avgrate20 <- ratesmerge %>%
  group_by(Jurisdiction) %>%
  summarize(
    Years = paste(range(2002:2021), collapse = "-"),
    across(starts_with("rate"), mean)
  )

When I change Years = paste(range(2002:2021), collapse = "-") to 2002-2012, it still takes the mean from 2002-2021.

Here is my output when doing head(df) enter image description here

Any help would be appreciated


Solution

  • Years = paste(range(yrs_wanted), collapse = "-") simply creates a column called Years containing the character vector "2002-2021" -- this doesn't tell R anything about what rows to include in computing the mean. For that, you need to dplyr::filter().

    library(dplyr)
    
    yrs_wanted <- 2002:2021
    
    avgrate20 <- ratesmerge %>%
      filter(MMWR_YEAR %in% yrs_wanted) %>%
      group_by(Jurisdiction) %>%
      summarize(
        Years = paste(range(yrs_wanted), collapse = "-"),
        across(starts_with("rate"), mean)
      )
    

    If you want to get fancy, you can loop through your year ranges using purrr::map_dfr():

    library(dplyr)
    library(purrr)
    
    year_ranges <- list(
      2002:2021,
      2002:2012,
      2002:2007
    )
    
    avgrates <- map_dfr(
      year_ranges,
      ~ ratesmerge %>%
      filter(MMWR_YEAR %in% .x) %>%
      group_by(Jurisdiction) %>%
      summarize(
        Years = paste(range(.x), collapse = "-"),
        across(starts_with("rate"), mean)
      )
    )