I have a dataset of pulse oximetry values. One measure for consecutive 20 minute periods. I would like to calculate the median value for each 6-hour consecutive window. The windows will not overlap so this is not a rolling median calculation. Any tips for R code to do this efficiently? The entire dataset is multiple patients and days of data; approximately 1 million rows.
The lubridate library from tidyverse is very helful. You can find the 6 hour epoch by using the floor_date function.
library(dplyr)
library(lubridate)
df<-tibble::tribble(
~date_time, ~pulse_ox,
"1/1/21 11:21.21", 97,
"1/2/21 11:34.34", 89
)
df_new<-df %>%
mutate(date_time_6_hour=floor_date(dmy_hms(date_time),"6 hours")) %>%
group_by(date_time_6_hour) %>%
summarize(median=median(pulse_ox))