I have data in a household roster as in the dataframe below
hhroster <- data.frame(HHID = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 6),
INDID = c(1, 2, 3, 1, 2, 1, 2, 3, 4, 1, 2, 3, 1, 2, 1),
response_1 = c("yes", "no", "yes", "yes", "no", "no", "no", "no", "no", "yes", "yes", "no", "yes", "yes", "no"),
response_2 = c("no", "no", "yes", "no", "no", "no", "yes", "no", "no", "no", "no", "no", "yes", "yes", "no"))
and would like to create a dummy variable at household level with the value 1 indicating there was at least one yes response from an individual. The desired output is
hh <- data.frame(HHID = c(1, 2, 3, 4, 5, 6),
HH_response_1 = c(1, 1, 0, 1, 1, 0),
HH_response_2 = c(1, 0, 1, 0, 1, 0))
Add: I have realized the dataset has values such as DK, RF and missing values and would like if a household has all its values among these the aggregate value should be NA and not 0.
Here is a solution.
Use across
to get all columns of interest and check if there are any yes values by checking if the sum of logical values .x == "yes"
is greater than zero.
You can keep the results as logical, R will coerce F/T
to 0/1
if and when necessary.
hhroster <- data.frame(HHID = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 6),
INDID = c(1, 2, 3, 1, 2, 1, 2, 3, 4, 1, 2, 3, 1, 2, 1),
response_1 = c("yes", "no", "yes", "yes", "no", "no", "no", "no", "no", "yes", "yes", "no", "yes", "yes", "no"),
response_2 = c("no", "no", "yes", "no", "no", "no", "yes", "no", "no", "no", "no", "no", "yes", "yes", "no"))
suppressPackageStartupMessages(
library(dplyr)
)
hhroster %>%
summarise(
across(starts_with("response"), ~ sum(.x == "yes") > 0L),
.by = HHID
)
#> HHID response_1 response_2
#> 1 1 TRUE TRUE
#> 2 2 TRUE FALSE
#> 3 3 FALSE TRUE
#> 4 4 TRUE FALSE
#> 5 5 TRUE TRUE
#> 6 6 FALSE FALSE
Created on 2024-02-10 with reprex v2.0.2