Example dataset
df <- data.frame(ID = c(11, 12, 11, 12, 13, 11, 13, 14, 12, 14, 15),
year = c(2022, 2022, 2021, 2021, 2021, 2020, 2020, 2020, 2019, 2019, 2019),
name = c("Jay", "Rahul", "Jay", "Rahul", "Pierre", "Jay", "Pierre",
"Shannon", "Rahul", "Shannon", "Zachary"))
ID | year | name |
---|---|---|
11 | 2022 | Jay |
12 | 2022 | Rahul |
11 | 2021 | Jay |
12 | 2021 | Rahul |
13 | 2021 | Pierre |
11 | 2020 | Jay |
13 | 2020 | Pierre |
14 | 2020 | Shannon |
12 | 2019 | Rahul |
14 | 2019 | Shannon |
15 | 2019 | Zachary |
How would I go about removing every ID that appears in the 2022 column (Jay and Rahul)? Please keep in mind that I'm working with around 9000 rows and around 70 years, so I can't look up names physically. I need to efficiently tell R to look up all the IDs in 2022 and then remove any row that has those IDs.
I've tried to look for this solution but all I find are articles on conditionally removing rows via filter and/or subset and I haven't found a way to make those work for this scenario.
Also, I'm working with the following packages so feel free to use them in your answer - tidyverse, dplyr, janitor, stringr
The output should look like this
ID | year | name |
---|---|---|
13 | 2021 | Pierre |
13 | 2020 | Pierre |
14 | 2020 | Shannon |
14 | 2019 | Shannon |
15 | 2019 | Zachary |
Thank you
Here are a couple dplyr
methods:
# grouped filter
df %>% group_by(ID) %>%
filter(!any(year == 2022)) %>%
ungroup()
# anti-join
df %>%
filter(year == 2022) %>%
select(ID) %>%
anti_join(x = df, y = ., by = "ID")