My dataset contains scan times by nurses in hospital rooms. My goal is to determine how many minutes the the room was unattended.
The first thing I need to do is to create 2 new rows per room for each day. The new records should represent the start and end of the day (00:00:00 and 23:59:59). The logic for creating these two records is that for example if the last scan for 2023-04-24 was at 3pm for Room1, we would like to know how many minutes had passed from 2023-04-24 23:59:59 to 2023-04-24 3:00:00.
Room ScanDT
1 room1 2023-04-24 10:08:38
2 room1 2023-04-24 10:09:36
3 room1 2023-04-24 11:54:35
4 room1 2023-05-24 13:51:10
5 room2 2023-04-24 18:51:10
6 room2 2023-04-24 20:51:10
What I need:
Room ScanDT NewRecord
1 room1 2023-04-24 00:00:00 New
2 room1 2023-04-24 10:08:38 Old
3 room1 2023-04-24 10:09:36 Old
4 room1 2023-04-24 11:54:35 Old
5 room1 2023-04-24 23:59:59 New
6 room1 2023-05-24 00:00:00 New
7 room1 2023-05-24 13:51:10 Old
8 room1 2023-05-24 23:59:59 New
9 room2 2023-04-24 00:00:00 New
10 room2 2023-04-24 18:51:10 Old
11 room2 2023-04-24 20:51:10 Old
12 room2 2023-04-24 23:23:23 New
Sample data
df <- data.frame(Room = c('room1', 'room1', 'room1', 'room1',
'room2', 'room2'),
ScanDT = as.POSIXct(c('2023-04-24 10:08:38', '2023-04-24 10:09:36', '2023-04-24 11:54:35', '2023-05-24 13:51:10',
'2023-04-24 18:51:10', '2023-04-24 20:51:10')))
Here's an attempt using the data.table package:
library(data.table)
setDT(df)
## add a scan id to show how to keep it
df[, ScanID := .I]
df[, ScanDate := as.POSIXct(trunc(ScanDT, units="days"))]
dfout <- df[, .(
ScanID=c(NA,ScanID,NA),
ScanDT = c(ScanDate, ScanDT, ScanDate + as.difftime(1, units="days") - 1)),
by=.(Room, ScanDate)]
dfout
## Room ScanDate ScanID ScanDT
## <char> <POSc> <int> <POSc>
## 1: room1 2023-04-24 NA 2023-04-24 00:00:00
## 2: room1 2023-04-24 1 2023-04-24 10:08:38
## 3: room1 2023-04-24 2 2023-04-24 10:09:36
## 4: room1 2023-04-24 3 2023-04-24 11:54:35
## 5: room1 2023-04-24 NA 2023-04-24 23:59:59
## 6: room1 2023-05-24 NA 2023-05-24 00:00:00
## 7: room1 2023-05-24 4 2023-05-24 13:51:10
## 8: room1 2023-05-24 NA 2023-05-24 23:59:59
## 9: room2 2023-04-24 NA 2023-04-24 00:00:00
##10: room2 2023-04-24 5 2023-04-24 18:51:10
##11: room2 2023-04-24 6 2023-04-24 20:51:10
##12: room2 2023-04-24 NA 2023-04-24 23:59:59
Further calculations of differences if required:
dfout[, ScanDTdiff := c(NA, `units<-`(diff(ScanDT), "mins")), by=.(Room, ScanDate)]
## Room ScanDate ScanID ScanDT ScanDTdiff
## <char> <POSc> <int> <POSc> <difftime>
## 1: room1 2023-04-24 NA 2023-04-24 00:00:00 NA mins
## 2: room1 2023-04-24 1 2023-04-24 10:08:38 608.6333333 mins
## 3: room1 2023-04-24 2 2023-04-24 10:09:36 0.9666667 mins
## 4: room1 2023-04-24 3 2023-04-24 11:54:35 104.9833333 mins
## 5: room1 2023-04-24 NA 2023-04-24 23:59:59 725.4000000 mins
## 6: room1 2023-05-24 NA 2023-05-24 00:00:00 NA mins
## 7: room1 2023-05-24 4 2023-05-24 13:51:10 831.1666667 mins
## 8: room1 2023-05-24 NA 2023-05-24 23:59:59 608.8166667 mins
## 9: room2 2023-04-24 NA 2023-04-24 00:00:00 NA mins
##10: room2 2023-04-24 5 2023-04-24 18:51:10 1131.1666667 mins
##11: room2 2023-04-24 6 2023-04-24 20:51:10 120.0000000 mins
##12: room2 2023-04-24 NA 2023-04-24 23:59:59 188.8166667 mins