Search code examples
rperformancematrixout-of-memorymatrix-multiplication

How to build & store this large lower triangular matrix for matrix-vector multiplication?


I need to create a lower triangular matrix with a special structure then do a matrix-vector multiplication.

The matrix is parameterized by a value k. It main diagonal is a vector of k ^ 0, i.e. 1; the first sub-diagonal is a vector of k ^ 1, and the i-th sub-diagonal holds k ^ i.

Here is a 5 x 5 example with k = 0.9:

structure(c(1, 0.9, 0.81, 0.729, 0.6561, 0, 1, 0.9, 0.81, 0.729, 
0, 0, 1, 0.9, 0.81, 0, 0, 0, 1, 0.9, 0, 0, 0, 0, 1), .Dim = c(5L, 5L))
#       [,1]  [,2] [,3] [,4] [,5]
#[1,] 1.0000 0.000 0.00  0.0    0
#[2,] 0.9000 1.000 0.00  0.0    0
#[3,] 0.8100 0.900 1.00  0.0    0
#[4,] 0.7290 0.810 0.90  1.0    0
#[5,] 0.6561 0.729 0.81  0.9    1

I need to construct such a matrix as large as 100,000 x 100,000 and use it for computation. I need the most efficient storage method for this. Any ideas?


Solution

  • You don't always need to explicitly form a matrix to do a matrix-vector or matrix-matrix multiplication. For example, no one really forms a diagonal matrix and use it for such computations.

    There is no substantial difference between your matrix and a diagonal matrix.

    So you reduce the operation to a series of vector addition. Here is a trivial R-level implementation.

    MatVecMul <- function (y, k) {
      n <- length(y)
      z <- numeric(n)
      for (i in 1:n) z[i:n] <- z[i:n] + k ^ (i - 1) * y[1:(n - i + 1)]
      z
      }
    

    A comparison with direct matrix construction and computation.

    d <- structure(c(1, 0.9, 0.81, 0.729, 0.6561, 0, 1, 0.9, 0.81, 0.729, 
    0, 0, 1, 0.9, 0.81, 0, 0, 0, 1, 0.9, 0, 0, 0, 0, 1), .Dim = c(5L, 5L))
    set.seed(0); y <- runif(5)
    c(d %*% y)
    #[1] 0.8966972 1.0725361 1.3374064 1.7765191 2.5070750
    
    MatVecMul(y, 0.9)
    #[1] 0.8966972 1.0725361 1.3374064 1.7765191 2.5070750
    

    Can replace the R-level "for" loop easily with Rcpp.

    library(Rcpp)
    cppFunction("NumericVector MatVecMul_cpp (NumericVector y, double k) {
      int n = y.size();
      NumericVector z(n);
      int i; double *p1, *p2, *end = &z[n];
      double tmp = 1.0;
      for (i = 0; i < n; i++) {
        for (p1 = &z[i], p2 = &y[0]; p1 < end; p1++, p2++) *p1 += tmp * (*p2);
        tmp *= k;
        }
      return z;
      }")
    
    MatVecMul_cpp(y, 0.9)
    #[1] 0.8966972 1.0725361 1.3374064 1.7765191 2.5070750
    

    Let's have a benchmark.

    v <- runif(1e4)
    system.time(MatVecMul(y, 0.9))
    #   user  system elapsed 
    #  3.196   0.000   3.198 
    system.time(MatVecMul_cpp(y, 0.9))
    #   user  system elapsed 
    #  0.840   0.000   0.841 
    

    One caution though: be aware of machine precision. As soon as k ^ (i - 1) becomes too small, you may lose all significant digits during addition. See R: approximating `e = exp(1)` using `(1 + 1 / n) ^ n` gives absurd result when `n` is large. In this example with k = 0.9, there is k ^ 400 = 5e-19. So even though the full matrix is 10000 x 10000, it is numerically banded than lower triangular. This means that we can actually terminate the loop earlier. But I will not implement this.