Search code examples
rdplyrduplicatestidyr

R- Creating New Variable Based on Dupication of Patient Admission Based on Values in a Different Row


I have a dataset that is looking at admissions to hospital. Each patient has an admission time (adm_time) and a discharge time (dc_time). To analyse the data, I would like to find the duplicates (ie the patients that were discharged and then readmitted), change the admission and discharge variable names (to adm_time_1 & dc_time_1), and create a new variable for the second admission and discharge times (to adm_time_2 & dc_time_2). Then I would like to remove the second admission.

Here is a dummy dataset:

    df <- 
      tibble(
        patient_id = c(1, 2, 2, 3),
        sex = c("M", "F", "F", "M"),
        adm_time = c("2022-01-03 14:00:00", "2022-01-09 16:42:00", "2022-01-23 22:30:00", "2022-01-10 12:10:00"),
        dc_time = c("2022-01-12 09:09:00", "2022-01-21 10:08:00", "2022-01-27 09:12:00", "2022-01-14 09:36:00")
  )

This produces the following output: enter image description here

I would like to move the row with patient_id == 2 to the second line, producing the following output: enter image description here

I have tried to use the duplicated function, to identify and rename the new columns, but can't figure out how to combine the 2 admission times. For example, this:

library(tidyverse)    
df |> 
    rename(adm_time_1 = `adm_time`, dc_time_1 = `dc_time`)|>
    mutate(adm_time_2 = lubridate::as_datetime(ifelse(duplicated(patient_id, fromLast = 
    TRUE), `adm_time_1`, NA)))

will select out the duplicated patient ids, but only gives the admission times for the first admission time.

Any assistance would be appreciated.

Thanks,

Ben


Solution

  • Something like this should work:

    df %>%
      group_by(patient_id) %>%
      mutate(adm_time = list(adm_time), dc_time = list(dc_time)) %>%
      distinct() %>%
      unnest_wider(c(adm_time, dc_time), names_sep = "_")
    
      patient_id sex   adm_time_1          adm_time_2          dc_time_1   dc_time_2
           <dbl> <chr> <chr>               <chr>               <chr>       <chr>    
    1          1 M     2022-01-03 14:00:00 NA                  2022-01-12… NA       
    2          2 F     2022-01-09 16:42:00 2022-01-23 22:30:00 2022-01-21… 2022-01-…
    3          3 M     2022-01-10 12:10:00 NA                  2022-01-14… NA