I have a dataset of the household roster with a question on education and would want to sum cases where edu_enrolment == 'no'
but only for members aged between 6 and 17. Currently, I am summing all cases.
My dataset is
df <- structure(list(id = c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4,
4, 5, 5, 5, 5, 5, 5), age = c(47, 15, 8, 35, 30, 17, 5, 3, 23,
15, 12, 4, 18, 10, 56, 41, 15, 12, 4, 3), edu_enrolment = c(NA,
"no", "yes", NA, NA, "dnk", "yes", "yes", NA, "no", "no", "yes",
NA, "yes", NA, NA, "yes", "no", "yes", "yes")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L))
and the code am using is
education <- education |>
group_by(id) |>
mutate(
notenroll_count = ifelse(all(is.na(edu_enrolment) | edu_enrolment %in% c('dnk')), NA, sum(edu_enrolment == 'no', na.rm = TRUE))
) |>
ungroup()
any help will be greatly appreciated.
Is this what you are intending?
df |>
mutate(mysum = sum(edu_enrolment[age %in% 6:17] == "no", na.rm = T), .by = id)
id age edu_enrolment mysum
<dbl> <dbl> <chr> <int>
1 1 47 NA 1
2 1 15 no 1
3 1 8 yes 1
4 2 35 NA 0
5 2 30 NA 0
6 2 17 dnk 0
7 2 5 yes 0
8 2 3 yes 0
9 3 23 NA 2
10 3 15 no 2
11 3 12 no 2
12 3 4 yes 2
13 4 18 NA 0
14 4 10 yes 0
15 5 56 NA 1
16 5 41 NA 1
17 5 15 yes 1
18 5 12 no 1
19 5 4 yes 1
20 5 3 yes 1