In a dataset of 200,000/400,000 rows, with recorded variables for several id's for each minute over a period of 6 months, I want to extract data occurring within a certain time of day, say 7 am to 17 pm. I generally work in Tidyverse.
Lubridate intervals seems to be the function to use, but how do I get this to work on recurring time-spans across dates?
Example data:
id<-c("A","A","A","B","B","B")
datetime<-ymd_hms("2023-05-24 06:59:00","2023-05-24 07:00:00","2023-05-24 07:01:00","2023-05-24 06:59:00","2023-05-24 07:00:00","2023-05-24 07:01:00")
measurement<-c(4,5,7,9,2,6)
df<-data.frame(id,datetime,measurement)
I can add a column with just the time
mutate(time=format(datetime,format="%H:%M"))
But creating an interval with only time, not date, does not seem to work
valid_time<-interval(hm('07:00'),hm('17:00'))
My aim was to then create a column with TRUE/FALSE of whether time point is within interval, and then filter on that.
mutate(valid=time %within% valid_time)
Ideally, I could just make a one-step filtering on datetime, something like the below (not working)
filter(time %in% 0700:1700)
or even better (also not working)
filter(hm(datetime) %in% 0700:1700)
Much thanks in advance!
Here is a data.table
approach using as.ITime()
, which represents the number of seconds in the day.
library(data.table)
setDT(df)[as.ITime(datetime) %between% c(as.ITime("07:00:00"),
as.ITime("17:00:00")), ]
id datetime measurement
<char> <POSc> <num>
1: A 2023-05-24 07:00:00 5
2: A 2023-05-24 07:01:00 7
3: B 2023-05-24 07:00:00 2
4: B 2023-05-24 07:01:00 6