Search code examples
rmatrixmemory

Can not update matrix entries for large matrices


I am working with large matrices on a 32GB RAM Windows PC. For a certain size of matrix I am able to assign to individual cells of the matrix, this here:

> d_time <- matrix(NA, 20000, 100000)
>         d_time[1,1]
[1] NA
>         d_time[1,1] <- 1
>         d_time[1,1]
[1] 1

However, for larger matrices, this throws an error here:

> d_time <- matrix(NA, 30000, 100000)
>         d_time[1,1]
[1] NA
>         d_time[1,1] <- 1
Error: cannot allocate vector of size 22.4 Gb
> object.size(d_time)
12000000216 bytes

This second matrix is 11.2GB in size in RStudio.

Can anyone explain why this is happening, and is there anyway I can get this operation done?

Thanks.


Solution

  • NA is of type logical. 1 is of type 'double`.

    typeof(NA)
    #> [1] "logical"
    typeof(1)
    #> [1] "double"
    

    When you initialize the matrix with NA, it will be of type logical. If you try to change an element of that matrix to a double, it will upgrade the whole matrix, which requires creating a new one. A double object will require twice the memory as a logical object of the same size. You apparently don't have enough memory for a double matrix of that size.

    If you really need a matrix that large, you have a couple options:

    1. As suggested by @s_baldur, if the elements of the matrix are all integer, you can initialize the matrix with d_time <- matrix(NA_integer_, 30000, 100000) and update it with, e.g., d_time[1,1] <- 1L. This will require half the memory as a double matrix of the same size.
    2. If a relatively large number of cells will remain empty (zero), work with sparse matrices.