I need to subset dataframe d and I like to keep one line for each ID number. But lines kept should either include I50 in AD or BD, and only the one with the earliest date should be kept.
So lastly we will have dataframe with two line (ID:1&2) and either I50 in AD/BD, and the earliest possible date, hence dates would be 2007-12-12 and 2009-12-12.
I really tried a lot but could not find a solution.
ID <- c(1,1,1,1,1,2,2,2,2,2)
AD <- c("DJ400", "DJ300", "DI501", "DI509", "DR409",
"DI509", "DJ200", "DA300", "DI500", "DR209")
Date <- as.Date(c("2010-12-12", "2011-12-12", "2007-12-12", "2008-12-12", "2009-12-12",
"2011-12-12", "2012-12-12", "2008-12-12", "2009-12-12", "2010-12-12"))
BD <- c("DI509", "DI500", "DI401", "DI409", "DR609",
"DI309", "DJ200", "DA300", "DI500", "DI509")
d <- data.frame(ID, AD, Date, BD)
hf <- subset(d, AD %in% "I50" | BD %in% "I50")
Created on 2022-01-10 by the reprex package (v2.0.0)
After first solution I had some issues and i have made small changes and here is new reprex. I only need one line per ID. The problem is that several have same dates which I didn't include earlier.
ID <- c(1,1,1,1,1,2,2,2,2,2)
AD <- c("DJ400", "DJ300", "DI501", "DI509", "DR409",
"DI509", "DJ200", "DA300", "DI500", "DR209")
Date <- as.Date(c("2010-12-12", "2011-12-12", "2010-12-12", "20012-12-12", "2009-12-12",
"2011-12-12", "2012-12-12", "2012-12-12", "2009-12-12", "2010-12-12"))
BD <- c("DI509", "DI500", "DI401", "DI409", "DR609",
"DI309", "DJ200", "DA300", "DI500", "DI509")
d <- data.frame(ID, AD, Date, BD)
library(dplyr)
d %>%
group_by(ID) %>%
filter(if_any(c(AD, BD), ~ grepl("I50", .))) %>%
slice_min(Date) %>%
ungroup()
#> # A tibble: 3 x 4
#> ID AD Date BD
#> <dbl> <chr> <date> <chr>
#> 1 1 DJ400 2010-12-12 DI509
#> 2 1 DI501 2010-12-12 DI401
#> 3 2 DI500 2009-12-12 DI500
Created on 2022-01-11 by the reprex package (v2.0.1)
d2 <- subset(d, grepl("I50", AD) | grepl("I50", BD))
do.call(rbind, lapply(split(d2, d2$ID), function(z) z[which.min(z$Date),]))
# ID AD Date BD
# 1 1 DI501 2007-12-12 DI401
# 2 2 DI500 2009-12-12 DI500
library(dplyr)
d %>%
group_by(ID) %>%
filter(if_any(c(AD, BD), ~ grepl("I50", .))) %>%
slice_min(Date) %>%
ungroup()
# # A tibble: 2 x 4
# ID AD Date BD
# <dbl> <chr> <date> <chr>
# 1 1 DI501 2007-12-12 DI401
# 2 2 DI500 2009-12-12 DI500