My goal is to merge two datasets using date ranges. Dataset1 contains patients stays in a hospital overtime. Dataset2 contains room information overtime. My goal is to identify what type of room the stays were in my Dataset1. It can get complicated since room type in some hospitalizations can change. For example patient 101 second hospitalization was part ICU and part Emergency.
Dataset 1:
PatientID Hospital Room StartDate EndDate
1 101 ODCC 4SW 2020-06-04 16:21:47 2020-06-22 15:12:39
2 101 ODCC 1W 2020-06-22 15:12:40 2020-09-08 14:03:34
3 101 ODCC 1N 2020-09-08 14:03:35 2020-10-02 06:50:24
4 101 ODCC 1W 2020-10-02 06:50:25 2020-10-05 14:25:54
Dataset 2:
Hospital Room Type StartDT EndDT
1 ODCC 11A Other 2020-01-01 2021-05-12
2 ODCC 1W ICU 2020-06-01 2020-07-30
3 ODCC 1W Emergency 2020-08-01 2021-05-12
4 ODCC 1N Emergency 2020-11-05 2021-02-07
My goal:
Patient.ID Hospital Room StartDate EndDate Type StartDT EndDT
1 101 ODCC 4SW 2020-06-04 16:21:47 2020-06-22 15:12:39 <NA> <NA> <NA>
2 101 ODCC 1W 2020-06-22 15:12:40 2020-09-08 14:03:34 ICU 2020-06-01 2020-07-30
3 102 ODCC 1W 2020-06-22 15:12:40 2020-09-08 14:03:34 Emergency 2020-08-01 2021-05-12
4 101 ODCC 1N 2020-09-08 14:03:35 2020-10-02 06:50:24 <NA> <NA> <NA>
5 101 ODCC 1W 2020-10-02 06:50:25 2020-10-05 14:25:54 Emergency 2020-08-01 2021-05-12
Below you can find codes to replicate my datasets.
stays <- structure(list(
PatientID = c(101, 101, 101, 101),
Hospital = c("ODCC", "ODCC", "ODCC", "ODCC"),
Room = c("4SW", "1W", "1N", "1W"),
StartDate = structure(c(1591287707, 1592838760, 1599573815, 1601621425),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
EndDate = structure(c(1592838759, 1599573814, 1601621424, 1601907954),
class = c("POSIXct", "POSIXt"), tzone = "UTC")),
class = "data.frame", row.names = c(NA, -4L))
type <- structure(list(
Hospital = c("ODCC", "ODCC", "ODCC", "ODCC"),
Room = c("11A", "1W", "1W", "1N"), Type = c("Other", "ICU", "Emergency",
"Emergency"),
StartDT = structure(c(1577836800, 1590969600, 1596240000, 1604534400),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
EndDT = structure(c(1620777600, 1596067200, 1620777600, 1612656000),
class = c("POSIXct","POSIXt"), tzone = "UTC")),
class = "data.frame", row.names = c(NA, -4L))
You could use foverlaps
:
library(data.table)
setDT(stays)
setDT(type)
setkey(type,Hospital,Room,StartDT,EndDT)
foverlaps(stays,type,
by.x=c("Hospital","Room","StartDate","EndDate"),
by.y=c("Hospital","Room","StartDT","EndDT"),type="any")[
,.(
PatientID,
Hospital,
Room,
StartDate,
EndDate,
Type,
StartDT,
EndDT)]
PatientID Hospital Room StartDate EndDate Type StartDT EndDT
<num> <char> <char> <POSc> <POSc> <char> <POSc> <POSc>
1: 101 ODCC 4SW 2020-06-04 16:21:47 2020-06-22 15:12:39 <NA> <NA> <NA>
2: 101 ODCC 1W 2020-06-22 15:12:40 2020-09-08 14:03:34 ICU 2020-06-01 2020-07-30
3: 101 ODCC 1W 2020-06-22 15:12:40 2020-09-08 14:03:34 Emergency 2020-08-01 2021-05-12
4: 101 ODCC 1N 2020-09-08 14:03:35 2020-10-02 06:50:24 <NA> <NA> <NA>
5: 101 ODCC 1W 2020-10-02 06:50:25 2020-10-05 14:25:54 Emergency 2020-08-01 2021-05-12