Search code examples
rmediancalculated-field

R code to calculate median of values in consecutive time-windows of fixed length


I have a dataset of pulse oximetry values. One measure for consecutive 20 minute periods. I would like to calculate the median value for each 6-hour consecutive window. The windows will not overlap so this is not a rolling median calculation. Any tips for R code to do this efficiently? The entire dataset is multiple patients and days of data; approximately 1 million rows.


Solution

  • The lubridate library from tidyverse is very helful. You can find the 6 hour epoch by using the floor_date function.

    library(dplyr)
    library(lubridate)
    
    df<-tibble::tribble(
      ~date_time, ~pulse_ox,
      "1/1/21 11:21.21",       97,
      "1/2/21 11:34.34",       89
    )
    
    df_new<-df %>%
      mutate(date_time_6_hour=floor_date(dmy_hms(date_time),"6 hours")) %>%
      group_by(date_time_6_hour) %>%
      summarize(median=median(pulse_ox))