I have a dataset that is looking at admissions to hospital. Each patient has an admission time (adm_time
) and a discharge time (dc_time
). To analyse the data, I would like to find the duplicates (ie the patients that were discharged and then readmitted), change the admission and discharge variable names (to adm_time_1
& dc_time_1
), and create a new variable for the second admission and discharge times (to adm_time_2
& dc_time_2
). Then I would like to remove the second admission.
Here is a dummy dataset:
df <-
tibble(
patient_id = c(1, 2, 2, 3),
sex = c("M", "F", "F", "M"),
adm_time = c("2022-01-03 14:00:00", "2022-01-09 16:42:00", "2022-01-23 22:30:00", "2022-01-10 12:10:00"),
dc_time = c("2022-01-12 09:09:00", "2022-01-21 10:08:00", "2022-01-27 09:12:00", "2022-01-14 09:36:00")
)
This produces the following output:
I would like to move the row with patient_id == 2
to the second line, producing the following output:
I have tried to use the duplicated
function, to identify and rename the new columns, but can't figure out how to combine the 2 admission times. For example, this:
library(tidyverse)
df |>
rename(adm_time_1 = `adm_time`, dc_time_1 = `dc_time`)|>
mutate(adm_time_2 = lubridate::as_datetime(ifelse(duplicated(patient_id, fromLast =
TRUE), `adm_time_1`, NA)))
will select out the duplicated patient ids, but only gives the admission times for the first admission time.
Any assistance would be appreciated.
Thanks,
Ben
Something like this should work:
df %>%
group_by(patient_id) %>%
mutate(adm_time = list(adm_time), dc_time = list(dc_time)) %>%
distinct() %>%
unnest_wider(c(adm_time, dc_time), names_sep = "_")
patient_id sex adm_time_1 adm_time_2 dc_time_1 dc_time_2
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 M 2022-01-03 14:00:00 NA 2022-01-12… NA
2 2 F 2022-01-09 16:42:00 2022-01-23 22:30:00 2022-01-21… 2022-01-…
3 3 M 2022-01-10 12:10:00 NA 2022-01-14… NA