I'm sorry if a similar question has been answered already but I can't seem to find any posts helping me. I wish to define two separate intervention groups (linked to this previous question I asked here). I have an unbalanced panel data set compromising over 100,000 IDs. One row = one month of data for a specific ID.
Intervention 1: I want to include all the rows of an ID if the ID meets the condition (Scheme1 ==1) at least once in the data and fails to meet the other condition (Scheme2 ==0).
Intervention 2: I want to include all rows of an ID if the ID meets both conditions at least once in the data (Scheme1 ==1 and Scheme2 ==1).
I used code like this to get the ControlGroup:
DF %>% group_by(ID) %>% mutate(totalSchemes=sum(Scheme1+Scheme2)) %>% filter(totalSchemes==0) -> ControlGroup
However, if I try apply a similar code to get the different intervention groups, I only get the rows for IDs where Scheme1 ==1 and Scheme2 ==0 (intervention 1) or Scheme==1 and Scheme2 ==1 (intervention 2). Ideally what I would like for each intervention group is all of the rows of the IDs which enter Scheme 1, or both of the schemes, including the rows in which the ID has not entered the scheme(s).
Intervention 1:
Inter1 <- DF %>% filter(ID %in% (DF %>% filter(Scheme1==1 & Scheme2==0))$ID & !(ID %in% (DF %>% filter(Scheme2==1))$ID))
Intervention 2:
Inter2 <- DF %>% filter(ID %in% (DF %>% filter(Scheme1==1 & Scheme2==1))$ID)