Search code examples
rdplyrtidyversecasegrepl

Can't Figure How to use case_when and grepl to calculate per 100k for different years


As part of a project with COVID data, I need to calculate the rates per 100,000 people for my table. The math itself is easy, the hard part is that there is population data for two different years: 2020, and 2021. I have data from 2020 to 2022. I need to calculate the rate for 2020 using the 2020 population estimate, and the rates for 2021 and 2022 using the 2021 estimate. These all have to go in the same columns.

It was suggested to me in the assignment to use mutate(case_when(grepl), but grepl is a logic operator and I have no clue where to start. The question and expected answers are shown below:

Edit: code I've tried plus given error:

error code

question

desired results

Any tips on how to use case_when and grepl, or how to build my own function, would be greatly appreciated.

So far, I have tried

us_totals %>%
  mutate(total_deaths = case_when(
                          grepl("2020", date) ~ (deaths / estimated_population_2020) * 100000),
                          TRUE ~ (deaths / estimated_population_2021) * 100000)

I know that's not correct, and I know that grepl and case_when don't take that format, but I'm at a loss.

I've also tried creating my own function to do these calculations, but nothing has worked so far.

For expected results, see above.


Solution

  • Fake data:

    estimated_population_2020 <- 3.0E8 # 300,000,000
    estimated_population_2021 <- 3.1E8 # 300,000,000
    us_totals <- data.frame(date = as.Date(c("2020-01-01", "2021-01-01")),
                            deaths = c(5E5, 6E5))
    

    Code that works

    us_totals %>%
      mutate(total_deaths = case_when(
        grepl("2020", date) ~ (deaths / estimated_population_2020) * 100000,
        TRUE ~ (deaths / estimated_population_2021) * 100000))
    
            date deaths total_deaths
    1 2020-01-01  5e+05     166.6667
    2 2021-01-01  6e+05     193.5484
    

    Code from question that fails with the same error. If fails because the extra ) at the end of the grepl line ends the case_when, so that TRUE ~ (deaths / estimated_population_2021) * 100000) is erroneously evaluated as if it's a second variable to define after total_deaths, instead of as a second case for the case_when. With that removed, you need another ) at the end, so the )) conclude both the case_when and the mutate.

    us_totals %>%
      mutate(total_deaths = case_when(
        grepl("2020", date) ~ (deaths / estimated_population_2020) * 100000), # EXTRA )
        TRUE ~ (deaths / estimated_population_2021) * 100000) # MISSING )
    
    Error in `mutate()`:
    ℹ In argument: `TRUE ~ (deaths/estimated_population_2021) * 1e+05`.
    Caused by error:
    ! `TRUE ~ (deaths/estimated_population_2021) * 1e+05` must be a vector, not a <formula> object.
    Run `rlang::last_trace()` to see where the error occurred.