Search code examples
rtidyverselubridate

How to extract recurring time intervals with lubridate?


In a dataset of 200,000/400,000 rows, with recorded variables for several id's for each minute over a period of 6 months, I want to extract data occurring within a certain time of day, say 7 am to 17 pm. I generally work in Tidyverse.

Lubridate intervals seems to be the function to use, but how do I get this to work on recurring time-spans across dates?

Example data:

id<-c("A","A","A","B","B","B")
datetime<-ymd_hms("2023-05-24 06:59:00","2023-05-24 07:00:00","2023-05-24 07:01:00","2023-05-24 06:59:00","2023-05-24 07:00:00","2023-05-24 07:01:00")
measurement<-c(4,5,7,9,2,6)
df<-data.frame(id,datetime,measurement)

I can add a column with just the time

mutate(time=format(datetime,format="%H:%M"))

But creating an interval with only time, not date, does not seem to work

valid_time<-interval(hm('07:00'),hm('17:00'))

My aim was to then create a column with TRUE/FALSE of whether time point is within interval, and then filter on that.

mutate(valid=time %within% valid_time)

Ideally, I could just make a one-step filtering on datetime, something like the below (not working)

filter(time %in% 0700:1700)

or even better (also not working)

filter(hm(datetime) %in% 0700:1700)

Much thanks in advance!


Solution

  • Here is a data.table approach using as.ITime(), which represents the number of seconds in the day.

    library(data.table)
    setDT(df)[as.ITime(datetime) %between% c(as.ITime("07:00:00"), 
                                             as.ITime("17:00:00")), ]
    
           id            datetime measurement
       <char>              <POSc>       <num>
    1:      A 2023-05-24 07:00:00           5
    2:      A 2023-05-24 07:01:00           7
    3:      B 2023-05-24 07:00:00           2
    4:      B 2023-05-24 07:01:00           6