Search code examples
rxtssnowperformanceanalyticssnowfall

sfLapply & apply.rolling on a xts object - Resulting Error: subscript out of bounds


My goal is to map daily return of 5 stocks (a xts object) to a rolling standard deviation of a look back period of 90 days (calculating the SD of the return of the last 90 days) with the same data structure and with fast speed. The approach by using core function "lapply" works great. However, the parallel approach "sfLapply" in snowfall package did not work for some reasons. Here is illustration:

Initializing libraries and simulating a data-set & parameters:

require(PerformanceAnalytics)
require(quantmod)
require(snowfall)

adjReturns <- replicate(5, rnorm(10000, mean = 0.01, sd = 0.008))
colnames(adjReturns) <- c('stock1','stock2','stock3','stock4','stock5')
timeIndex <- seq.Date(as.Date("2015-01-01", "%Y-%m-%d"), by ="day", length.out = 10000)
adjReturns <- as.xts(adjReturns, order.by = timeIndex)

Calculating Rolling SD using lapply resulting a solution that works:

rollingSD <- list()
rollingSD <- lapply(adjReturns, function(x) apply.rolling(x, width = 90, FUN = "sd"))
rollingSD <- do.call(cbind, rollingSD)

Here is the parallel version that did not work:

sfInit(parallel = TRUE, cpus = 4, type = "SOCK", socketHosts = rep("localhost", 2))
sfLibrary(snowfall)
sfLibrary(PerformanceAnalytics)
sfLibrary(xts)
sfLibrary(quantmod)
sfExportAll()

rollingSDSnow <- list()
rollingSDSnow <- sfLapply(adjReturns, function(x) apply.rolling(x, width = 90, FUN = "sd"))
rollingSDSnow <- do.call(cbind, rollingSDSnow)

sfStop()

The code above return the following error:

Error in `[.xts`(x, i) : subscript out of bounds

I am not sure why I would get this error as I am not even writing my own for loops. Please point out any possible mistakes, Any thought would be appreciated and thanks for helping!

Environment: R:3.2.0/ RStudio:0.99.472 / snow:0.3-13 / snowfall:1.84-6/ xts:0.9-7/ PerfomanceAnalytics:1.4.3541

P.S. runSD could have been used instead of apply.rolling, apply.rolling is used as it can work with different functions.


Solution

  • Here's the traceback:

    > rollingSDSnow <- sfLapply(adjReturns, function(x) apply.rolling(x, 90, FUN = sd))
    Error in `[.xts`(x, i) : subscript out of bounds
    > traceback()
    13: stop("subscript out of bounds")
    12: `[.xts`(x, i)
    11: x[i]
    10: FUN(X[[i]], ...)
    9: lapply(splitIndices(length(x), ncl), function(i) x[i])
    8: splitList(x, length(cl))
    7: staticClusterApply(cl, fun, length(x), argfun)
    6: clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)
    5: lapply(args, enquote)
    4: do.call("fun", lapply(args, enquote))
    3: docall(c, clusterApply(cl, splitList(x, length(cl)), lapply, 
           fun, ...))
    2: parLapply(sfGetCluster(), x, fun, ...)
    1: sfLapply(adjReturns, function(x) apply.rolling(x, 90, FUN = sd))
    

    The splitList function is what is failing. It fails because it expects a list (the "L" in sfLapply), but you passed an xts object. The length of an xts object is the number of observations, and x[i] attempts to return the ith row of an xts object, where i could be nrow(x)*ncol(x), which is out of bounds.


    The solution is to use sfApply instead (I'm going to use runSD because I don't want to wait for apply.rolling to finish running).

    rollingSD <- list()
    rollingSD <- lapply(adjReturns, runSD, n=90)
    rollingSD <- do.call(cbind, rollingSD)
    
    sfInit(parallel = TRUE, cpus = 4, type = "SOCK", socketHosts = rep("localhost", 2))
    sfLibrary(snowfall)
    sfLibrary(quantmod)
    sfExportAll()
    rollingSDSnow <- list()
    rollingSDSnow <- sfApply(adjReturns, 2, runSD, n=90)
    rollingSDSnow <- xts(rollingSDSnow, index(adjReturns))
    sfStop()
    
    all.equal(rollingSDSnow, rollingSD)
    # [1] TRUE