Use data from multiple years (=rows) to filter another column

I have a df:

df <- data.frame(nr = c(rep("n01", 10),
                    rep("n03", 13),
                    rep("n04", 8),
                    rep("n06", 14),
                    rep("n08", 13),
                    rep("n12", 14)),
             yr = c(2012:2021, 
                    2010:2022,
                    2013:2020,
                    2010:2023,
                    2011:2023,
                    2009:2022),
             md = c(rep("no", 5),
                    rep("yes", 8),
                    rep("no", 13),
                    rep("yes", 1),
                    rep("no", 9),
                    rep("yes", 5),
                    rep("no", 20),
                    rep("yes", 2),
                    rep("no", 9)))

I want to filter my dataset based on the md column for each nr. I want to keep only the nr s that have at least 2x a 'yes' in 2 subsequent years, followed by at least 1 "no". This "no" cannot be followed by a "yes" anymore in further years. But the initial 2x "yes" can be preceded by anything. Meaning that in this dummy df I want to keep only n03, n06 and n12. How can I do that? (preferably with dplyr) Of course my real df is waaay bigger so I need an automated review of the no/yes pattern of all rows belonging to a specific nr.

Solution

You can try the following dplyr approach, which uses cumsum to look at the total number of "yes" by nr then scans for any "no" after the last yes (using max). Then filters and removes temp variables

library(dplyr)

df %>%
  mutate(temp = cumsum(md %in% "yes"),
         temp2 = temp == max(temp) & md %in% "no",
         .by = nr) %>%
  filter(any(temp2) & temp >= 2, .by = nr) %>% 
  select(-starts_with("temp"))

Note you could do all this in one messy filter statement and remove the need to create temp variables:

df %>%
  filter(any(cumsum(md %in% "yes") == max(cumsum(md %in% "yes")) &
               md %in% "no") & 
           cumsum(md %in% "yes") >= 2, 
         .by = nr)

Output:

   nr   yr  md
1  n03 2011 yes
2  n03 2012 yes
3  n03 2013  no
4  n03 2014  no
5  n03 2015  no
6  n03 2016  no
7  n03 2017  no
8  n03 2018  no
9  n03 2019  no
10 n03 2020  no
11 n03 2021  no
12 n03 2022  no
13 n06 2016 yes
14 n06 2017 yes
15 n06 2018 yes
16 n06 2019 yes
17 n06 2020  no
18 n06 2021  no
19 n06 2022  no
20 n06 2023  no
21 n12 2013 yes
22 n12 2014  no
23 n12 2015  no
24 n12 2016  no
25 n12 2017  no
26 n12 2018  no
27 n12 2019  no
28 n12 2020  no
29 n12 2021  no
30 n12 2022  no