Search code examples
rsparse-matrix

Getting "node stack overflow" when cbind multiple sparse matrices


I have 100,000 sparse matrices("dgCMatrix") store in a list object. The row number of every matrix is the same(8,000,000) and the size of the list is approximately 25 Gb. Now when I do:

do.call(cbind, theListofMatrices)

to combine all matrices into one big sparse matrix, I got "node stack overflow". Actually, I can't even do this with only 500 elements out of that list, which should output a sparse matrix with a size of only 100 Mb.

My speculation for this is that the cbind() function transformed the sparse matrix to a normal dense matrix and thus cause the stack overflow?

Actually, I have tried something like this:

tmp = do.call(cbind, theListofMatrices[1:400])

this works fine, and tmp is still a sparse matrix with a size of 95 Mb, and then I tried:

> tmp = do.call(cbind, theListofMatrices[1:410])
Error in stopifnot(0 <= deparse.level, deparse.level <= 2) : 
  node stack overflow

and then the error occurred. However, I am having no trouble doing something like:

cbind(tmp, tmp, tmp, tmp)

thus, I believe it has something to do with do.call()

Reduce() seems to solve my problem, though I still don't know the reason why do.call() crushes.


Solution

  • The problem is not in do.call() but due to the way cbind from the Matrix package is implemented. It uses recursion to bind the individual arguments together. For instance, Matrix::cbind(mat1, mat2, mat3) is translated to something along the lines of Matrix::cbind(mat1, Matrix::cbind(mat2, mat3)). Since do.call(cbind, theListofMatrices) is basically cbind(theListofMatrices[[1]], theListofMatrices[[2]], ...) you have too many arguments to the cbind function and you will end up with a recursion that's nested too deeply and it will fail.

    Thus, Ben's comment to use Reduce() is a good way to work around that issue since it avoids the recursion and replaces it with an iteration:

    tmp <- Reduce(cbind, theListofMatrices[-1], theListofMatrices[[1]])