i have a df, which containing 4 columns:
Its looks like this:
id | mo | sample_date | pid_code |
---|---|---|---|
1 | Staphylococcus epidermidis | 2019-12-13 | 120 |
2 | Staphylococcus epidermidis | 2019-12-14 | 120 |
3 | Staphylococcus capitis | 2020-01-13 | 131 |
4 | Staphylococcus hominis | 2021-11-14 | 155 |
5 | Staphylococcus hominis | 2021-11-17 | 155 |
6 | Micrococcus luteus | 2023-06-06 | 133 |
7 | Micrococcus luteus | 2023-06-07 | 133 |
i want to filter those rows where the pid_code and mo is same, and repeated in 2 days.
So i want to get back these rows.
id | mo | sample_date | pid_code |
---|---|---|---|
1 | Staphylococcus epidermidis | 2019-12-13 | 120 |
2 | Staphylococcus epidermidis | 2019-12-14 | 120 |
6 | Micrococcus luteus | 2023-06-06 | 133 |
7 | Micrococcus luteus | 2023-06-07 | 133 |
Do you have something advise? Thanks
Here is a dplyr
solution:
library(dplyr)
df %>%
filter(any(abs(difftime(sample_date, lag(sample_date), units = "days")) <= 2), .by=c(mo, pid_code))
id mo sample_date pid_code
1 Staphylococcus epidermidis 2019-12-13 120
2 Staphylococcus epidermidis 2019-12-14 120
6 Micrococcus luteus 2023-06-06 133
7 Micrococcus luteus 2023-06-07 133