Search code examples
rdplyr

Find the minimum non-missing date among dates in a row in R?


I have the following data frame in R:

USUBJID <- c(1, 2, 3)
EOT <- c("2021-08-17", "2020-08-10", "2020-08-10")
DTHDT <- c(NA, "2021-04-10", "2022-08-05")
EOSDT <- c("2023-08-17", "2021-04-10", "2022-08-05")

# Create DataFrame
dat<-data.frame(USUBJID=USUBJID, EOTDT=as.Date(c("2021-08-17", "2020-08-10", "2020-08-10")), 
                DTHDT=as.Date(c(NA, "2021-04-10", "2022-08-05`enter code here`")), 
                EOSDT=as.Date(c("2023-08-17", "2021-04-10", "2022-08-05"))) 

I would like to generate a fifth column that takes the minimum non-missing date among the 3 other dates for each USUBJID. For instance, the the first subject that would be 2021-08-17.


Solution

  • Base R:

    cbind(dat, minimum = apply(dat[-1], 1, min, na.rm = TRUE))
    
      USUBJID      EOTDT      DTHDT      EOSDT    minimum
    1       1 2021-08-17       <NA> 2023-08-17 2021-08-17
    2       2 2020-08-10 2021-04-10 2021-04-10 2020-08-10
    3       3 2020-08-10 2022-08-05 2022-08-05 2020-08-10 
    
    cbind(dat, minimum = do.call(pmin, c(dat[-1], na.rm = TRUE)))
      USUBJID      EOTDT      DTHDT      EOSDT    minimum
    1       1 2021-08-17       <NA> 2023-08-17 2021-08-17
    2       2 2020-08-10 2021-04-10 2021-04-10 2020-08-10
    3       3 2020-08-10 2022-08-05 2022-08-05 2020-08-10
    

    dat %>%
      mutate(minimum = exec(pmin, !!!dat[-1], na.rm = TRUE))
    
      USUBJID      EOTDT      DTHDT      EOSDT    minimum
    1       1 2021-08-17       <NA> 2023-08-17 2021-08-17
    2       2 2020-08-10 2021-04-10 2021-04-10 2020-08-10
    3       3 2020-08-10 2022-08-05 2022-08-05 2020-08-10
    
    
    dat %>%
       mutate(minimum = min(c_across(everything()), na.rm = TRUE), .by = USUBJID)
      USUBJID      EOTDT      DTHDT      EOSDT    minimum
    1       1 2021-08-17       <NA> 2023-08-17 2021-08-17
    2       2 2020-08-10 2021-04-10 2021-04-10 2020-08-10
    3       3 2020-08-10 2022-08-05 2022-08-05 2020-08-10
    
    dat %>%
       pivot_longer(-USUBJID, values_drop_na = TRUE) %>%
       summarise(minimum = min(value), .by = USUBJID) %>%
       right_join(dat)
    
    # A tibble: 3 × 5
      USUBJID minimum    EOTDT      DTHDT      EOSDT     
        <dbl> <date>     <date>     <date>     <date>    
    1       1 2021-08-17 2021-08-17 NA         2023-08-17
    2       2 2020-08-10 2020-08-10 2021-04-10 2021-04-10
    3       3 2020-08-10 2020-08-10 2022-08-05 2022-08-05