Search code examples
rjoinmergedata.tabledate-range

Merge two datasets based on date ranges. R


My goal is to merge two datasets using date ranges. Dataset1 contains patients stays in a hospital overtime. Dataset2 contains room information overtime. My goal is to identify what type of room the stays were in my Dataset1. It can get complicated since room type in some hospitalizations can change. For example patient 101 second hospitalization was part ICU and part Emergency.

Dataset 1:

  PatientID Hospital Room           StartDate             EndDate
1       101     ODCC  4SW 2020-06-04 16:21:47 2020-06-22 15:12:39
2       101     ODCC   1W 2020-06-22 15:12:40 2020-09-08 14:03:34
3       101     ODCC   1N 2020-09-08 14:03:35 2020-10-02 06:50:24
4       101     ODCC   1W 2020-10-02 06:50:25 2020-10-05 14:25:54 

Dataset 2:

  Hospital Room      Type    StartDT      EndDT
1     ODCC  11A     Other 2020-01-01 2021-05-12
2     ODCC   1W       ICU 2020-06-01 2020-07-30
3     ODCC   1W Emergency 2020-08-01 2021-05-12
4     ODCC   1N Emergency 2020-11-05 2021-02-07

My goal:

  Patient.ID Hospital Room           StartDate             EndDate      Type    StartDT      EndDT
1        101     ODCC  4SW 2020-06-04 16:21:47 2020-06-22 15:12:39      <NA>       <NA>       <NA>
2        101     ODCC   1W 2020-06-22 15:12:40 2020-09-08 14:03:34       ICU 2020-06-01 2020-07-30
3        102     ODCC   1W 2020-06-22 15:12:40 2020-09-08 14:03:34 Emergency 2020-08-01 2021-05-12
4        101     ODCC   1N 2020-09-08 14:03:35 2020-10-02 06:50:24      <NA>       <NA>       <NA>
5        101     ODCC   1W 2020-10-02 06:50:25 2020-10-05 14:25:54 Emergency 2020-08-01 2021-05-12

Below you can find codes to replicate my datasets.

stays <- structure(list(
  PatientID = c(101, 101, 101, 101),
  Hospital = c("ODCC", "ODCC", "ODCC", "ODCC"),
  Room = c("4SW", "1W", "1N", "1W"), 
  StartDate = structure(c(1591287707, 1592838760, 1599573815, 1601621425),
                        class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
  EndDate = structure(c(1592838759, 1599573814, 1601621424, 1601907954), 
                      class = c("POSIXct", "POSIXt"), tzone = "UTC")),
  class = "data.frame", row.names = c(NA, -4L))

type <- structure(list(
  Hospital = c("ODCC", "ODCC", "ODCC", "ODCC"), 
  Room = c("11A", "1W", "1W", "1N"), Type = c("Other", "ICU", "Emergency",
                                              "Emergency"), 
  StartDT = structure(c(1577836800, 1590969600, 1596240000, 1604534400), 
                      class = c("POSIXct", "POSIXt"), tzone = "UTC"),
  EndDT = structure(c(1620777600, 1596067200, 1620777600, 1612656000),
                    class = c("POSIXct","POSIXt"), tzone = "UTC")), 
  class = "data.frame", row.names = c(NA, -4L))

Solution

  • You could use foverlaps:

    library(data.table)
    setDT(stays)
    setDT(type)
    setkey(type,Hospital,Room,StartDT,EndDT)
    foverlaps(stays,type,
              by.x=c("Hospital","Room","StartDate","EndDate"),
              by.y=c("Hospital","Room","StartDT","EndDT"),type="any")[
                ,.(
                  PatientID,
                  Hospital,
                  Room,
                  StartDate,
                  EndDate,
                  Type,
                  StartDT,
                  EndDT)]       
    
       PatientID Hospital   Room           StartDate             EndDate      Type    StartDT      EndDT
           <num>   <char> <char>              <POSc>              <POSc>    <char>     <POSc>     <POSc>
    1:       101     ODCC    4SW 2020-06-04 16:21:47 2020-06-22 15:12:39      <NA>       <NA>       <NA>
    2:       101     ODCC     1W 2020-06-22 15:12:40 2020-09-08 14:03:34       ICU 2020-06-01 2020-07-30
    3:       101     ODCC     1W 2020-06-22 15:12:40 2020-09-08 14:03:34 Emergency 2020-08-01 2021-05-12
    4:       101     ODCC     1N 2020-09-08 14:03:35 2020-10-02 06:50:24      <NA>       <NA>       <NA>
    5:       101     ODCC     1W 2020-10-02 06:50:25 2020-10-05 14:25:54 Emergency 2020-08-01 2021-05-12