Search code examples
rfor-loopmatrixsparse-matrix

Populating Large Matrix and Computations


I am trying to populate a 25000 x 25000 matrix in a for loop, but R locks up on me. The data has many zero entries, so would a sparse matrix be suitable?

Here is some sample data and code.

x<-c(1,3,0,4,1,0,4,1,1,4) 
y<-x

z<-matrix(NA,nrow=10,ncol=10)

for(i in 1:10){
    if(x[i]==0){
        z[i,]=0
    } else{
          for(j in 1:10){
          if(x[i]==y[j]){
            z[i,j]=1
           } else{z[i,j]=0
             }
           }
       }
}   

One other question. Is it possible to do computations on matrices this large. When I perform some calculations on some sample matrices of this size I get an output of NA with a warning of integer overflow or R completely locks up.


Solution

  • You could vectorize this and that should help you. Also, if your data is indeed sparse and you can conduct your analysis on a sparse matrix it definitely is something to consider.

    library(Matrix)
    
    # set up all pairs
    pairs <- expand.grid(x,x)
    # get matrix indices
    idx <- which(pairs[,1] == pairs[,2] & pairs[,1] != 0)
    
    # create empty matrix with zero's instead
    z<-matrix(0,nrow=10,ncol=10)
    z[idx] = 1
    
    # create empty sparse matrix
    z2 <-Matrix(0,nrow=10,ncol=10, sparse=TRUE)
    z2[idx] = 1
    
    all(z == z2)
    [1] TRUE
    

    The comment by @alexis_lax would make this even simpler and faster. I had completely forgotten about the outer function.

    # normal matrix
    z = outer(x, x, "==") * (x!=0)
    
    # sparse matrix
    z2 = Matrix(outer(x, x, "==") * (x!=0), sparse=TRUE)
    

    To answer your second question if computations can be done on such a big matrix the answer is yes. You just need to approach it more cautiously and use the appropriate tools. Sparse matrices are nice and many typical matrix functions are available and some other package are compatible. Here is a link to a page with some examples.

    Another thought, if you are working with really large matrices you may want to look in to other packages like bigmemory which are designed to deal with R's large overhead.