Search code examples
rsparse-matrixrolling-computation

Rollapply on large sparse matrices


Is there a way to do a rollapply/rollsum to compute row sums on a sparse matrix over windows of fixed length? I'm working with dgTMatrix for convenience, but my problem is not specific to this class. For example, consider generating a 8 x 10 sparse matrix.

library(Matrix)
i <- c(1,3:8); j <- c(2,9,6:10); x <- 7 * (1:7)
A <- sparseMatrix(i, j, x = x, giveCsparse = FALSE)    

> A
8 x 10 sparse Matrix of class "dgTMatrix"

[1,] . 7 . . .  .  .  .  .  .
[2,] . . . . .  .  .  .  .  .
[3,] . . . . .  .  .  . 14  .
[4,] . . . . . 21  .  .  .  .
[5,] . . . . .  . 28  .  .  .
[6,] . . . . .  .  . 35  .  .
[7,] . . . . .  .  .  . 42  .
[8,] . . . . .  .  .  .  . 49

Without first coercing to a matrix, (e.g. as.matrix()), one naive approach is using sapply to compute row sums over every window=2 columns, resulting in a 8 x 5 dense matrix .

window = 2
starts = seq(1,dim(A)[2],by=window)
A_rollsum <- sapply(starts, function(x) Matrix::rowSums(A[, x:(x+window-1)]))

> A_rollsum
     [,1] [,2] [,3] [,4] [,5]
[1,]    7    0    0    0    0
[2,]    0    0    0    0    0
[3,]    0    0    0    0   14
[4,]    0    0   21    0    0
[5,]    0    0    0   28    0
[6,]    0    0    0   35    0
[7,]    0    0    0    0   42
[8,]    0    0    0    0   49

This is not efficient for large sparse matrices.


Solution

  • 1) rollapply works column by column and apparently you want row by row so transpose it, use rollapply as shown and transpose back:

    t(rollapply(t(as.matrix(A)), 2, by = 2, sum))
    

    giving:

         [,1] [,2] [,3] [,4] [,5]
    [1,]    7    0    0    0    0
    [2,]    0    0    0    0    0
    [3,]    0    0    0    0   14
    [4,]    0    0   21    0    0
    [5,]    0    0    0   28    0
    [6,]    0    0    0   35    0
    [7,]    0    0    0    0   42
    [8,]    0    0    0    0   49
    

    2) The above uses dense matrices but if you really need sparse matrices note that rollapply is a linear operator here so we can compute its matrix and then use sparse matrix multiplication.

    d <- rollapply(diag(10), 2, by = 2, sum)
    A %*% t(d)
    

    Old

    The question was changed. this is the answer to the original question.

    Try r1. We show that it equals r2.

    r1 <- rollapply(rowSums(A), 3, c)
    r2 <- rollapply(as.matrix(A), 3, rowSums, by.column = FALSE)
    identical(r1, r2)
    ## [1] TRUE
    

    r1 and therefore also r2 equal:

    > r1
         [,1] [,2] [,3]
    [1,]    7    0   14
    [2,]    0   14   21
    [3,]   14   21   28
    [4,]   21   28   35
    [5,]   28   35   42
    [6,]   35   42   49