Search code examples
rtime-seriesxts

How to use apply.daily/period.apply for calculating maximum per column in XTS time series?


I have a problem using the period.apply function for my case of a high resolution time series analysis.

I want to calculate statistics(Mean for different Periods, Stddev etc.) for my data which is in 10 min intervals. To calculate hourly means worked fine like described in this answer.

It creates a new xts object with means calculated for each column. How do I calculate maximum values for each column?

This reproducible example describes the structure of my data:

library(xts)
start <- as.POSIXct("2018-05-18 00:00")
tseq <- seq(from = start, length.out = 1440, by = "10 mins")
Measurings <- data.frame(
  Time = tseq,
  Temp = sample(10:37,1440, replace = TRUE, set.seed(seed = 10)),
  Variable1 = sample(1:200,1440, replace = TRUE, set.seed(seed = 187)),
  Variable2 = sample(300:800,1440, replace = TRUE, set.seed(seed = 333))
)
Measurings_xts <- xts(Measurings[,-1], Measurings$Time)
HourEnds <- endpoints(Measurings_xts, "hours")
Measurings_mean <- period.apply(Measurings_xts, HourEnds, mean)

I thought it would be easy to just change the function argument from mean to max, like this:

Measurings_max <- period.apply(Measurings_xts, HourEnds, max)

It delivers output, but only one column with the overall maximum values. I need the hourly maximums of each column. A simple solution would be much appreciated.


Solution

  • The mean example works by column because there's a zoo method that calls mean on each column (this method is used because xts extends zoo).

    The max example returns one number because there is no max.xts or max.zoo method, so it returns the maximum of the entire xts/zoo object.

    A simple solution is to define a helper function:

    colMax <- function(x, na.rm = FALSE) {
      apply(x, 2, max, na.rm = na.rm)
    }
    

    Then use that in your period.apply call:

    epHours <- endpoints(Measurings_xts, "hours")
    Measurings_max <- period.apply(Measurings_xts, epHours, colMax)
    head(Measurings_max)
    #                     Temp Variable1 Variable2
    # 2018-05-18 00:50:00   29       194       787
    # 2018-05-18 01:50:00   28       178       605
    # 2018-05-18 02:50:00   26       188       756
    # 2018-05-18 03:50:00   34       152       444
    # 2018-05-18 04:50:00   33       145       724
    # 2018-05-18 05:50:00   35       187       621