Search code examples
rtimestamprolling-computation

Calculate Rolling 12 Hours by Group in R


I am working on a project where I have to only include patients who had lab tests ordered at least 12 hours apart, and to keep the timestamp of each included lab test. The issue is that many patients get several labs done within the 12 hour window, but the client has asked to not include those tests. I have made it this far:

#Create dummy dataset
df = data.frame(
  "Encounter" = c(rep("12345", times=16), rep("67890", times = 5)),
  "Timestamp" = c("01/06/2022 04:00:00", "01/07/2022 08:00:00",
                   "01/08/2022 00:00:00", "01/08/2022 04:00:00",
                   "01/08/2022 08:00:00", "01/08/2022 20:00:00",
                   "01/09/2022 04:00:00", "01/09/2022 08:00:00",
                   "01/09/2022 20:00:00", "01/09/2022 23:26:00",
                   "01/10/2022 00:00:00", "01/10/2022 08:00:00",
                   "01/10/2022 20:00:00", "01/11/2022 00:00:00",
                   "01/11/2022 20:00:00", "01/12/2022 04:00:00",
                   "11/10/2021 11:00:00", "11/10/2021 12:00:00",
                   "11/10/2021 13:00:00", "11/10/2021 14:00:00",
                   "11/11/2021 00:00:00"))

#Convert timestamp to POSIXlt format
df$Timestamp <- strptime(as.character(df$Timestamp), format="%m/%d/%Y %H:%M")

#Calculate time (in hours) between each previous timestamp by Encounter
df <- df %>% 
  group_by(Encounter) %>% 
  arrange(Encounter, Timestamp) %>% 
  mutate(difftime(Timestamp, lag(Timestamp), units="hours"))

I can't seem to figure out what to do next. It seems like I need to calculate a rolling 12-hours that then resets to 0 once a row hits 12 hours, but I'm not sure how to go about it. Below is my ideal result:

df$Keep.Row <- c(1,1,1,0,0,1,0,1,1,0,0,1,1,0,1,0,1,0,0,0,1)

Screenshot of desired final product


Solution

  • There is absolutely nothing elegant about this, but I believe it gives you what you’re looking for. I use a temporary variable to store the “rolling” sum before it’s reset once the hours between is 12 or greater.

    library(tidyverse)
    
    df <- df %>% 
      group_by(Encounter) %>% 
      arrange(Encounter, Timestamp) %>% 
      mutate(time_diff = difftime(Timestamp, lag(Timestamp), units="hours")) %>%
      replace_na(list(time_diff = 0)) %>%
      mutate(temp = ifelse(time_diff < 12 & lag(time_diff) >= 12, time_diff, lag(time_diff) + time_diff),
             temp = ifelse(is.na(temp), 0, temp),
             hours_between = ifelse(time_diff >= 12, time_diff,
                            ifelse(time_diff < 12 & lag(time_diff) >= 12, time_diff, lag(temp) + time_diff)),
             keep = ifelse(hours_between >= 12 | is.na(hours_between), 1, 0)) %>%
      select(-temp)
    

    Created on 2022-01-27 by the reprex package (v2.0.1)