Search code examples
rtime-seriesdata.tablexts

Rolling over function with 2 vector arguments


I want to apply rolling on the function that requires 2 vector arguments. Here is the exmample (that doesn't work) using data.table:

library(data.table)
df <- as.data.table(cbind.data.frame(x=1:100, y=101:200))
my_sum <- function(x, y) {
  x <- log(x)
  y <- x * y
  return(x + y)
}
roll_df <- frollapply(df, 10, function(x, y) {
  my_sum(x, y)})

It doesn't recognize y column. Ofc, the solution can be using xts or some other package.

EDIT: This is the real function I want to apply:

library(dpseg)
dpseg_roll <- function(time, price) {
  p <- estimateP(x=time, y=price, plot=FALSE)
  segs <- dpseg(time, price, jumps=jumps, P=p, type=type, store.matrix=TRUE)
  slope_last <- segs$segments$slope[length(segs$segments$slope)]
  return(slope_last)
}

Solution

  • With runner you can apply any function in rolling window. Running window can be created also on a rows of data.frame inserted to x argument. Let's focus on simpler function my_sum. Argument f in runner can accept only one object (data in this case). I encourage to put browser() to the function to debug row-by-row before you apply some fancy model on the subset (some algorithms requires some minimal number of observations).

    my_sum <- function(data) {
      # browser()
      x <- log(data$x)
      y <- x * data$y
      tail(x + y, 1) # return only one value
    }
    

    my_sum should return only one value, because runner computes for each row - if my_sum returns vector, you would get a list. Because runner is an independent function you need to pass data.table object to x. Best way to do this is to use x = .SD (see here why)

    df[, 
       new_col := runner(
          x = .SD,
          f = my_sum,
          k = 10
    )]