Search code examples
rdataframezoorollapply

How do I call a function using a specific time window?


Suppose I have a zoo object (or it could be a data.frame) that has an index on "time of day" and has some value (see sample data below):

                    val
...
2006-08-01 12:00    23
2006-08-01 12:01    24
2006-08-01 12:02    25
2006-08-01 12:03    26
2006-08-01 12:04    27
2006-08-01 12:05    28
2006-08-01 12:06    29
...
2006-08-02 12:00    123
2006-08-02 12:01    124
2006-08-02 12:02    125
2006-08-02 12:03    126
2006-08-02 12:04    127
...

I would like to call a custom function (call it custom.func(vals)) from 12:01 - 12:03 (i.e. something similar to zoo::rollapply) every time that interval occurs so in this example, daily. How would I do that?


NOTES (for robustness, it would also be great to take into account the following edge cases but not necessary):

  1. Don't assume that I have values for 12:01 - 12:03 every day
  2. Don't assume that the entire range 12:01 - 12:03 is present every day. Some days I might only have 12:01 and 12:02 but might be missing 12:03
  3. What if I wanted my custom.func(vals) to be called on day boundaries like using val from 23:58 - 00:12?

Solution

  • Suppose our input is the POSIXct zoo object z given in the Note at the end.

    Create a character vector times which has one element per element of z and is in the form HH:MM. Then create a logical ok which indicates which times are between the indicated boundary values. z[ok] is then z reduced to those values. Finally for each day apply sum (can use some other function if desired) using aggregate.zoo :

    times <- format(time(z), "%H:%M")
    ok <- times >= "12:01" & times <= "12:03"
    aggregate(z[ok], as.Date, sum)
    ## 2006-08-01 2006-08-02 
    ##         75        375 
    

    times straddle midnight

    The version is for the case where the times straddle midnight. Note that the order of values sent to the function is not the original order but if the function is symmetric that does not matter.

    times <- format(time(z), "%H:%M")
    ok <- times >= "23:58" | times <= "00:12"
    aggregate(z[ok], (as.Date(format(time(z))) + (times >= "23:58"))[ok], sum)
    ## 2006-08-02 
    ##         41 
    

    Variation

    The prior code chunk works if the function is symmetric in the components of its argument (which is the case for many functions such as mean and sum) but if the function were not symmetric we would need a slightly different approach. We define to.sec which translates an HH:MM string to numeric seconds and subtract to.sec("23:58") from each POSIXct datetime. Then the components of z to keep are those whose transformed times converted to HH:MM character strings that are less than "00:14".

    to.sec <- function(x) with(read.table(text = x, sep = ":"), 3600 * V1 + 60 * V2)
    times <- format(time(z) - to.sec("23:58"), "%H:%M")
    ok <- times <= "00:14"
    aggregate(z[ok], as.Date(time(z)[ok] - to.sec("23:58")), sum)
    ## 2006-08-01 
    ##         41 
    

    Note

    Lines <- "datetime val
    2006-08-01T12:00    23
    2006-08-01T12:01    24
    2006-08-01T12:02    25
    2006-08-01T12:03    26
    2006-08-01T12:04    27
    2006-08-01T12:05    28
    2006-08-01T12:06    29
    2006-08-01T23:58    20
    2006-08-02T00:01    21
    2006-08-02T12:00    123
    2006-08-02T12:01    124
    2006-08-02T12:02    125
    2006-08-02T12:03    126
    2006-08-02T12:04    127"
    
    library(zoo)
    z <- read.zoo(text = Lines, tz = "", header = TRUE, format = "%Y-%m-%dT%H:%M")
    

    EDIT

    Have revised the non-symmetric code and simplified all code chunks.