Search code examples
rtime-seriessubsettrading

Filtering time series conditional on list of dates by week number (R)


I have a data frame containing a time sequence sampled every 30 minutes (for 2016). I need to make a subset containing every Wednesday 10:30:00 if the week contains no holiday falling on Sunday to Wedneday, and every Thursday 11:00:00 if the week contains a holiday falling on Sunday to Wedneday. This would create a schedule of EIA petroleum weekly report releases. I do not want to use xts.

I know how to subset by day of week and time of day. But I do not know how to subset conditional on the week containing a date present in a list of dates. How could I do that?

The code below creates a subset by day of week and time of day without filtering by holidays. It also includes the list of holiday dates to use as filter.

#Make time sequence every 30mins with Time & DayWk columns 
Calendar30mn <- as.data.frame(seq(as.POSIXlt("2016/1/1 00:00:00"), as.POSIXlt("2016/12/31 23:59:59"), by="30 mins"))
colnames(Calendar30mn) <- "DateTime"
Calendar30mn$Time <- strftime(Calendar30mn$DateTime, format="%H:%M:%S")
Calendar30mn$DayWk <- weekdays(Calendar30mn$DateTime)

#List of US Federal holidays falling on Sunday/Monday/Tuesday/Wedneday 
FedHolidaysSuntoWed <- structure(c(16818, 16846, 16951, 16986, 17049, 17161, 17084), class = "Date")  

-----

#Subset for Wednesday 10:30:00
EIAOildates1 <- subset (Calendar30mn, Time == "10:30:00" & DayWk == "Wednesday")

#Subset for Thursday 11:00:00
EIAOildates2 <- subset (Calendar30mn, Time == "11:00:00" & DayWk == "Thursday")

#Bind subsets and set reverse order (most recent at the top)
EIAOildates <- rbind(EIAOildates1, EIAOildates2)

The above code generates EIAOildates1 containing a subset for Wednesday 10:30:00. I would like that subset to only contain Wednesday 10:30:00 if any day of that week is not present in FedHolidaysSuntoWed. And viceversa for EIAOildates2.


Solution

  • This is the answer:

    library(lubridate)
    
    #Make time sequence every 30mins with Time & DayWk & WkNumber columns 
    Calendar30mn <- as.data.frame(seq(as.POSIXlt("2016/1/1 00:00:00"), as.POSIXlt("2016/12/31 23:59:59"), by="30 mins"))
    colnames(Calendar30mn) <- "DateTime"
    Calendar30mn$Time <- strftime(Calendar30mn$DateTime, format="%H:%M:%S")
    Calendar30mn$DayWk <- weekdays(Calendar30mn$DateTime)
    Calendar30mn$WkNumber <- week(Calendar30mn$DateTime)
    
    #List of US Federal holidays falling on Sunday/Monday/Tuesday/Wedneday & Corresponding WkNumber
    FedHolidaysSuntoWed <- structure(c(16818, 16846, 16951, 16986, 17049, 17161, 17084), class = "Date")  
    FedHolidaysSuntoWedWkNumber <- week(FedHolidaysSuntoWed)
    
    #Subset for Wednesday 10:30:00
    EIAOildates1 <- subset (Calendar30mn, Time == "10:30:00" & DayWk == "Wednesday"
                            & !(Calendar30mn$WkNumber %in% FedHolidaysSuntoWedWkNumber))
    
    #Subset for Thursday 11:00:00
    EIAOildates2 <- subset (Calendar30mn, Time == "11:00:00" & DayWk == "Thursday"
                            & (Calendar30mn$WkNumber %in% FedHolidaysSuntoWedWkNumber))
    
    #Bind and sort subsets  
    EIAOildates <- rbind(EIAOildates1, EIAOildates2)
    EIAOildates <- EIAOildates[(order(as.Date(EIAOildates$DateTime))),]
    

    This is a sample of the output of EIAOildates:

                     DateTime     Time     DayWk WkNumber
    262   2016-01-06 10:30:00 10:30:00 Wednesday        1
    598   2016-01-13 10:30:00 10:30:00 Wednesday        2
    983   2016-01-21 11:00:00 11:00:00  Thursday        3
    1270  2016-01-27 10:30:00 10:30:00 Wednesday        4
    1606  2016-02-03 10:30:00 10:30:00 Wednesday        5
    1942  2016-02-10 10:30:00 10:30:00 Wednesday        6
    2327  2016-02-18 11:00:00 11:00:00  Thursday        7
    16726 2016-12-14 10:30:00 10:30:00 Wednesday       50
    17062 2016-12-21 10:30:00 10:30:00 Wednesday       51
    17447 2016-12-29 11:00:00 11:00:00  Thursday       52