Here is my code:
set.seed(23)
data_toy <- tibble(
family_code = sample(factor(400:410),1000,T),
event_type = factor(sample(c("sad","happy"),1000,
replace = TRUE,prob = c(.2,.8))),
score = sample(1:100,1000,TRUE)
) %>% mutate(score = if_else(event_type =="happy",NA,score)) %>%
arrange(family_code)
Output:
family_code event_type score
<fct> <fct> <int>
1 400 happy NA
2 400 happy NA
3 400 happy NA
4 400 happy NA
5 400 sad 57
6 400 happy NA
7 400 happy NA
8 400 happy NA
9 400 happy NA
10 400 sad 65
I would like to create a feature that counts the number of happy events until a sad event for each family.
In the example I shared, my desired output would be:
family_code event_type score happy_counter
<fct> <fct> <int> <dbl>
1 400 happy NA NA
2 400 happy NA NA
3 400 happy NA NA
4 400 happy NA NA
5 400 sad 57 4
6 400 happy NA NA
7 400 happy NA NA
8 400 happy NA NA
9 400 happy NA NA
10 400 sad 65 4
11 400 happy NA NA
12 400 happy NA NA
13 400 happy NA NA
14 400 happy NA NA
15 400 happy NA NA
16 400 happy NA NA
17 400 happy NA NA
18 400 happy NA NA
19 400 sad 79 8
20 400 sad 78 0
My data has approx. 10k observations. I tried group_by
and nest_by
but struggled with zeroing the count after each sad event.
Try
library(dplyr)
out <- data_toy %>%
group_by(family_code, ind = consecutive_id(event_type)) %>%
mutate(n = n()) %>%
slice_head(n = 1) %>%
group_by(family_code) %>%
mutate(n = lag(n) * NA^(event_type == "happy")) %>%
ungroup %>%
select(ind, family_code, event_type, happy_counter = n) %>%
left_join(data_toy %>%
mutate(ind = consecutive_id(event_type)), .) %>%
group_by(family_code, ind) %>%
mutate(happy_counter = happy_counter * (all(event_type == "sad") &
!duplicated(happy_counter))) %>%
ungroup
-output
head(out, 20)
# A tibble: 20 × 5
family_code event_type score ind happy_counter
<fct> <fct> <int> <int> <dbl>
1 400 happy NA 1 NA
2 400 happy NA 1 NA
3 400 happy NA 1 NA
4 400 happy NA 1 NA
5 400 sad 57 2 4
6 400 happy NA 3 NA
7 400 happy NA 3 NA
8 400 happy NA 3 NA
9 400 happy NA 3 NA
10 400 sad 65 4 4
11 400 happy NA 5 NA
12 400 happy NA 5 NA
13 400 happy NA 5 NA
14 400 happy NA 5 NA
15 400 happy NA 5 NA
16 400 happy NA 5 NA
17 400 happy NA 5 NA
18 400 happy NA 5 NA
19 400 sad 79 6 8
20 400 sad 78 6 0