Search code examples
rdatetimenew-operator

Add two rows per Group. Start and end of date


My dataset contains scan times by nurses in hospital rooms. My goal is to determine how many minutes the the room was unattended.

The first thing I need to do is to create 2 new rows per room for each day. The new records should represent the start and end of the day (00:00:00 and 23:59:59). The logic for creating these two records is that for example if the last scan for 2023-04-24 was at 3pm for Room1, we would like to know how many minutes had passed from 2023-04-24 23:59:59 to 2023-04-24 3:00:00.

   Room              ScanDT
1 room1 2023-04-24 10:08:38
2 room1 2023-04-24 10:09:36
3 room1 2023-04-24 11:54:35
4 room1 2023-05-24 13:51:10
5 room2 2023-04-24 18:51:10
6 room2 2023-04-24 20:51:10

What I need:

    Room              ScanDT NewRecord
1  room1 2023-04-24 00:00:00       New
2  room1 2023-04-24 10:08:38       Old
3  room1 2023-04-24 10:09:36       Old
4  room1 2023-04-24 11:54:35       Old
5  room1 2023-04-24 23:59:59       New
6  room1 2023-05-24 00:00:00       New
7  room1 2023-05-24 13:51:10       Old
8  room1 2023-05-24 23:59:59       New
9  room2 2023-04-24 00:00:00       New
10 room2 2023-04-24 18:51:10       Old
11 room2 2023-04-24 20:51:10       Old
12 room2 2023-04-24 23:23:23       New

Sample data

df <- data.frame(Room = c('room1', 'room1', 'room1', 'room1',
                          'room2', 'room2'),
                 ScanDT = as.POSIXct(c('2023-04-24 10:08:38', '2023-04-24 10:09:36', '2023-04-24 11:54:35', '2023-05-24 13:51:10',
                                       '2023-04-24 18:51:10', '2023-04-24 20:51:10')))

Solution

  • Here's an attempt using the data.table package:

    library(data.table)
    setDT(df)
    ## add a scan id to show how to keep it
    df[, ScanID := .I]
    
    df[, ScanDate := as.POSIXct(trunc(ScanDT, units="days"))]
    dfout <- df[, .(
                ScanID=c(NA,ScanID,NA),
                ScanDT = c(ScanDate, ScanDT, ScanDate + as.difftime(1, units="days") - 1)), 
                by=.(Room, ScanDate)]
    dfout
    ##      Room   ScanDate ScanID              ScanDT
    ##    <char>     <POSc>  <int>              <POSc>
    ## 1:  room1 2023-04-24     NA 2023-04-24 00:00:00
    ## 2:  room1 2023-04-24      1 2023-04-24 10:08:38
    ## 3:  room1 2023-04-24      2 2023-04-24 10:09:36
    ## 4:  room1 2023-04-24      3 2023-04-24 11:54:35
    ## 5:  room1 2023-04-24     NA 2023-04-24 23:59:59
    ## 6:  room1 2023-05-24     NA 2023-05-24 00:00:00
    ## 7:  room1 2023-05-24      4 2023-05-24 13:51:10
    ## 8:  room1 2023-05-24     NA 2023-05-24 23:59:59
    ## 9:  room2 2023-04-24     NA 2023-04-24 00:00:00
    ##10:  room2 2023-04-24      5 2023-04-24 18:51:10
    ##11:  room2 2023-04-24      6 2023-04-24 20:51:10
    ##12:  room2 2023-04-24     NA 2023-04-24 23:59:59
    

    Further calculations of differences if required:

    dfout[, ScanDTdiff := c(NA, `units<-`(diff(ScanDT), "mins")), by=.(Room, ScanDate)]
    
    ##      Room   ScanDate ScanID              ScanDT        ScanDTdiff
    ##    <char>     <POSc>  <int>              <POSc>        <difftime>
    ## 1:  room1 2023-04-24     NA 2023-04-24 00:00:00           NA mins
    ## 2:  room1 2023-04-24      1 2023-04-24 10:08:38  608.6333333 mins
    ## 3:  room1 2023-04-24      2 2023-04-24 10:09:36    0.9666667 mins
    ## 4:  room1 2023-04-24      3 2023-04-24 11:54:35  104.9833333 mins
    ## 5:  room1 2023-04-24     NA 2023-04-24 23:59:59  725.4000000 mins
    ## 6:  room1 2023-05-24     NA 2023-05-24 00:00:00           NA mins
    ## 7:  room1 2023-05-24      4 2023-05-24 13:51:10  831.1666667 mins
    ## 8:  room1 2023-05-24     NA 2023-05-24 23:59:59  608.8166667 mins
    ## 9:  room2 2023-04-24     NA 2023-04-24 00:00:00           NA mins
    ##10:  room2 2023-04-24      5 2023-04-24 18:51:10 1131.1666667 mins
    ##11:  room2 2023-04-24      6 2023-04-24 20:51:10  120.0000000 mins
    ##12:  room2 2023-04-24     NA 2023-04-24 23:59:59  188.8166667 mins