I have the following data frame in R:
USUBJID <- c(1, 2, 3)
EOT <- c("2021-08-17", "2020-08-10", "2020-08-10")
DTHDT <- c(NA, "2021-04-10", "2022-08-05")
EOSDT <- c("2023-08-17", "2021-04-10", "2022-08-05")
# Create DataFrame
dat<-data.frame(USUBJID=USUBJID, EOTDT=as.Date(c("2021-08-17", "2020-08-10", "2020-08-10")),
DTHDT=as.Date(c(NA, "2021-04-10", "2022-08-05`enter code here`")),
EOSDT=as.Date(c("2023-08-17", "2021-04-10", "2022-08-05")))
I would like to generate a fifth column that takes the minimum non-missing date among the 3 other dates for each USUBJID. For instance, the the first subject that would be 2021-08-17.
Base R:
cbind(dat, minimum = apply(dat[-1], 1, min, na.rm = TRUE))
USUBJID EOTDT DTHDT EOSDT minimum
1 1 2021-08-17 <NA> 2023-08-17 2021-08-17
2 2 2020-08-10 2021-04-10 2021-04-10 2020-08-10
3 3 2020-08-10 2022-08-05 2022-08-05 2020-08-10
cbind(dat, minimum = do.call(pmin, c(dat[-1], na.rm = TRUE)))
USUBJID EOTDT DTHDT EOSDT minimum
1 1 2021-08-17 <NA> 2023-08-17 2021-08-17
2 2 2020-08-10 2021-04-10 2021-04-10 2020-08-10
3 3 2020-08-10 2022-08-05 2022-08-05 2020-08-10
dat %>%
mutate(minimum = exec(pmin, !!!dat[-1], na.rm = TRUE))
USUBJID EOTDT DTHDT EOSDT minimum
1 1 2021-08-17 <NA> 2023-08-17 2021-08-17
2 2 2020-08-10 2021-04-10 2021-04-10 2020-08-10
3 3 2020-08-10 2022-08-05 2022-08-05 2020-08-10
dat %>%
mutate(minimum = min(c_across(everything()), na.rm = TRUE), .by = USUBJID)
USUBJID EOTDT DTHDT EOSDT minimum
1 1 2021-08-17 <NA> 2023-08-17 2021-08-17
2 2 2020-08-10 2021-04-10 2021-04-10 2020-08-10
3 3 2020-08-10 2022-08-05 2022-08-05 2020-08-10
dat %>%
pivot_longer(-USUBJID, values_drop_na = TRUE) %>%
summarise(minimum = min(value), .by = USUBJID) %>%
right_join(dat)
# A tibble: 3 × 5
USUBJID minimum EOTDT DTHDT EOSDT
<dbl> <date> <date> <date> <date>
1 1 2021-08-17 2021-08-17 NA 2023-08-17
2 2 2020-08-10 2020-08-10 2021-04-10 2021-04-10
3 3 2020-08-10 2020-08-10 2022-08-05 2022-08-05