I have 100,000 sparse matrices("dgCMatrix") store in a list object. The row number of every matrix is the same(8,000,000) and the size of the list is approximately 25 Gb. Now when I do:
do.call(cbind, theListofMatrices)
to combine all matrices into one big sparse matrix, I got "node stack overflow". Actually, I can't even do this with only 500 elements out of that list, which should output a sparse matrix with a size of only 100 Mb.
My speculation for this is that the cbind() function transformed the sparse matrix to a normal dense matrix and thus cause the stack overflow?
Actually, I have tried something like this:
tmp = do.call(cbind, theListofMatrices[1:400])
this works fine, and tmp is still a sparse matrix with a size of 95 Mb, and then I tried:
> tmp = do.call(cbind, theListofMatrices[1:410])
Error in stopifnot(0 <= deparse.level, deparse.level <= 2) :
node stack overflow
and then the error occurred. However, I am having no trouble doing something like:
cbind(tmp, tmp, tmp, tmp)
thus, I believe it has something to do with do.call()
Reduce() seems to solve my problem, though I still don't know the reason why do.call() crushes.
The problem is not in do.call()
but due to the way cbind
from the Matrix package is implemented. It uses recursion to bind the individual arguments together. For instance, Matrix::cbind(mat1, mat2, mat3)
is translated to something along the lines of Matrix::cbind(mat1, Matrix::cbind(mat2, mat3))
.
Since do.call(cbind, theListofMatrices)
is basically cbind(theListofMatrices[[1]], theListofMatrices[[2]], ...)
you have too many arguments to the cbind
function and you will end up with a recursion that's nested too deeply and it will fail.
Thus, Ben's comment to use Reduce()
is a good way to work around that issue since it avoids the recursion and replaces it with an iteration:
tmp <- Reduce(cbind, theListofMatrices[-1], theListofMatrices[[1]])