Search code examples
rdplyr

Conditional sum using dplyr


I have a dataset of the household roster with a question on education and would want to sum cases where edu_enrolment == 'no' but only for members aged between 6 and 17. Currently, I am summing all cases. My dataset is

df <- structure(list(id = c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 
4, 5, 5, 5, 5, 5, 5), age = c(47, 15, 8, 35, 30, 17, 5, 3, 23, 
15, 12, 4, 18, 10, 56, 41, 15, 12, 4, 3), edu_enrolment = c(NA, 
"no", "yes", NA, NA, "dnk", "yes", "yes", NA, "no", "no", "yes", 
NA, "yes", NA, NA, "yes", "no", "yes", "yes")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -20L))

and the code am using is

education <- education |>
            group_by(id) |>
            mutate(
              notenroll_count = ifelse(all(is.na(edu_enrolment) | edu_enrolment %in% c('dnk')), NA, sum(edu_enrolment == 'no', na.rm = TRUE))
            ) |>
            ungroup()

any help will be greatly appreciated.


Solution

  • Is this what you are intending?

    df |> 
      mutate(mysum = sum(edu_enrolment[age %in% 6:17] == "no", na.rm = T), .by = id)
    
          id   age edu_enrolment mysum
       <dbl> <dbl> <chr>         <int>
     1     1    47 NA                1
     2     1    15 no                1
     3     1     8 yes               1
     4     2    35 NA                0
     5     2    30 NA                0
     6     2    17 dnk               0
     7     2     5 yes               0
     8     2     3 yes               0
     9     3    23 NA                2
    10     3    15 no                2
    11     3    12 no                2
    12     3     4 yes               2
    13     4    18 NA                0
    14     4    10 yes               0
    15     5    56 NA                1
    16     5    41 NA                1
    17     5    15 yes               1
    18     5    12 no                1
    19     5     4 yes               1
    20     5     3 yes               1