Suppose I have a large sparse matrix and want to square it.
However, m %*% m
takes up alot of memory, e.g. 20-40 Gb.
Are there more memory efficient ways to do it? E.g. packages, etc.?
Maybe sparseMatrix
from Matrix
will be memory efficient.
library(Matrix)
m <- sparseMatrix(1:5, 1:5, x=1:5)
m %*% m
#m^2 #Alternative
#5 x 5 sparse Matrix of class "dgCMatrix"
#
#[1,] 1 . . . .
#[2,] . 4 . . .
#[3,] . . 9 . .
#[4,] . . . 16 .
#[5,] . . . . 25
Another option could be the slam
package:
library(slam)
m <- simple_triplet_matrix(1:5, 1:5, 1:5)
m^2
or spray
:
library(spray)
m <- spray(cbind(1:5, 1:5), 1:5)
m$value <- m$value^2
Comparing the methods:
n <- 1e3
bench::mark(check = FALSE
, base = {m <- diag(1:n); m %*% m}
, base2 = {m <- diag(1:n); m^2}
, Matrix = {m <- sparseMatrix(1:n, 1:n, x=1:n); m %*% m}
, Matrix2 = {m <- sparseMatrix(1:n, 1:n, x=1:n); m^2}
, slam = {m <- simple_triplet_matrix(1:n, 1:n, 1:n); m^2}
, spray = {m <- spray(cbind(1:n, 1:n), 1:n); m$value <- m$value^2; m}
)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
#1 base 711.65ms 711.65ms 1.41 26.7MB 1.41 1 1
#2 base2 2.48ms 2.66ms 236. 11.4MB 76.7 123 40
#3 Matrix 687.17µs 712.63µs 1379. 63.1KB 4.00 690 2
#4 Matrix2 610.89µs 637.48µs 1541. 55.2KB 4.00 771 2
#5 slam 36.1µs 39.18µs 15309. 19.7KB 4.00 7652 2
#6 spray 2.49ms 2.76ms 338. 81.3KB 5.97 170 3
It looks like that slam
is most efficient in memory usage and also in speed.