Search code examples
rloopsvectorizationlarge-data

Vectorization of growth


I am searching for a solution that implements the following simple growth-rate formula by applying vectorization in R:

gr <- function(x){
a <- matrix(,nrow=nrow(x),ncol=ncol(x))
   for (j in 1:ncol(x)){
      for (i in 2:nrow(x)){
        if (!is.na(x[i,j]) & !is.na(x[i-1,j]) & x[i-1,j] != 0){
           result[i,j] <- x[i,j]/x[i-1,j]-1 
        }
       }
    }
return(a)
}

I found the xts package to generate lags of time-series, but in the end I always ended up having to compare to many values (see above), so I cannot simply use ifelse. One possible problem is when the time-series (e.g. a price index) has zeros in between. This would create NaNs in the result, which I am trying to avoid and which cannot simply be removed afterwards (edit: apparently they can, see the answers below!)

In short: I'd like to produce a table of correct growth rates for a given table of values. Here is an example:

m <- matrix(c(1:3,NA,2.4,2.8,3.9,0,1,3,0,2,1.3,2,NA,7,3.9,2.4),6,3)

generates:

      [,1] [,2] [,3]
[1,]  1.0  3.9  1.3
[2,]  2.0  0.0  2.0
[3,]  3.0  1.0   NA
[4,]   NA  3.0  7.0
[5,]  2.4  0.0  3.9
[6,]  2.8  2.0  2.4

correct result, produced by gr(m):

           [,1] [,2]       [,3]
[1,]        NA   NA         NA
[2,] 1.0000000   -1  0.5384615
[3,] 0.5000000   NA         NA
[4,]        NA    2         NA
[5,]        NA   -1 -0.4428571
[6,] 0.1666667   NA -0.3846154

But this takes forever with large tables. Is there any way to use conditions without looping so extensively?


Solution

  • You can speed this up by performing the entire calculation in a single vectorized operation (with one additional operation to fix up the results whenever you divide by 0):

    out <- rbind(NA, tail(m, -1) / head(m, -1) - 1)
    out[!is.finite(out)] <- NA
    out
    #           [,1] [,2]       [,3]
    #             NA   NA         NA
    # [2,] 1.0000000   -1  0.5384615
    # [3,] 0.5000000   NA         NA
    # [4,]        NA    2         NA
    # [5,]        NA   -1 -0.4428571
    # [6,] 0.1666667   NA -0.3846154
    

    This is much faster than a looping solution, as demonstrated on a 1000 x 1000 example:

    set.seed(144)
    m <- matrix(rnorm(10000000), 10000, 1000)
    system.time(j <- josilber(m))
    #    user  system elapsed 
    #   1.425   0.030   1.446 
    system.time(g <- gr(m))
    #    user  system elapsed 
    #  34.551   0.263  36.581 
    

    The vectorized solution provides a 25x speedup.