r performance matrix out-of-memory matrix-multiplication

How to build & store this large lower triangular matrix for matrix-vector multiplication?

I need to create a lower triangular matrix with a special structure then do a matrix-vector multiplication.

The matrix is parameterized by a value k. It main diagonal is a vector of k ^ 0, i.e. 1; the first sub-diagonal is a vector of k ^ 1, and the i-th sub-diagonal holds k ^ i.

Here is a 5 x 5 example with k = 0.9:

structure(c(1, 0.9, 0.81, 0.729, 0.6561, 0, 1, 0.9, 0.81, 0.729, 
0, 0, 1, 0.9, 0.81, 0, 0, 0, 1, 0.9, 0, 0, 0, 0, 1), .Dim = c(5L, 5L))
#       [,1]  [,2] [,3] [,4] [,5]
#[1,] 1.0000 0.000 0.00  0.0    0
#[2,] 0.9000 1.000 0.00  0.0    0
#[3,] 0.8100 0.900 1.00  0.0    0
#[4,] 0.7290 0.810 0.90  1.0    0
#[5,] 0.6561 0.729 0.81  0.9    1

I need to construct such a matrix as large as 100,000 x 100,000 and use it for computation. I need the most efficient storage method for this. Any ideas?

Solution

You don't always need to explicitly form a matrix to do a matrix-vector or matrix-matrix multiplication. For example, no one really forms a diagonal matrix and use it for such computations.

There is no substantial difference between your matrix and a diagonal matrix.

So you reduce the operation to a series of vector addition. Here is a trivial R-level implementation.

MatVecMul <- function (y, k) {
  n <- length(y)
  z <- numeric(n)
  for (i in 1:n) z[i:n] <- z[i:n] + k ^ (i - 1) * y[1:(n - i + 1)]
  z
  }

A comparison with direct matrix construction and computation.

d <- structure(c(1, 0.9, 0.81, 0.729, 0.6561, 0, 1, 0.9, 0.81, 0.729, 
0, 0, 1, 0.9, 0.81, 0, 0, 0, 1, 0.9, 0, 0, 0, 0, 1), .Dim = c(5L, 5L))
set.seed(0); y <- runif(5)
c(d %*% y)
#[1] 0.8966972 1.0725361 1.3374064 1.7765191 2.5070750

MatVecMul(y, 0.9)
#[1] 0.8966972 1.0725361 1.3374064 1.7765191 2.5070750

Can replace the R-level "for" loop easily with Rcpp.

library(Rcpp)
cppFunction("NumericVector MatVecMul_cpp (NumericVector y, double k) {
  int n = y.size();
  NumericVector z(n);
  int i; double *p1, *p2, *end = &z[n];
  double tmp = 1.0;
  for (i = 0; i < n; i++) {
    for (p1 = &z[i], p2 = &y[0]; p1 < end; p1++, p2++) *p1 += tmp * (*p2);
    tmp *= k;
    }
  return z;
  }")

MatVecMul_cpp(y, 0.9)
#[1] 0.8966972 1.0725361 1.3374064 1.7765191 2.5070750

Let's have a benchmark.

v <- runif(1e4)
system.time(MatVecMul(y, 0.9))
#   user  system elapsed 
#  3.196   0.000   3.198 
system.time(MatVecMul_cpp(y, 0.9))
#   user  system elapsed 
#  0.840   0.000   0.841

One caution though: be aware of machine precision. As soon as k ^ (i - 1) becomes too small, you may lose all significant digits during addition. See R: approximating `e = exp(1)` using `(1 + 1 / n) ^ n` gives absurd result when `n` is large. In this example with k = 0.9, there is k ^ 400 = 5e-19. So even though the full matrix is 10000 x 10000, it is numerically banded than lower triangular. This means that we can actually terminate the loop earlier. But I will not implement this.