Search code examples
rsummultiple-columnsmatrix-multiplicationcross-product

Efficient way to calculate cross products, multiply matrix and sum it up


First, let's produce some dummy data for reference:

X <- matrix(runif(27*27, 0, 1), nrow=27, ncol=27)
Y <- matrix(runif(27*27, 0, 1), nrow=27, ncol=27)

I have 2 matrices, X and Y. First, I'm going to calculate the cross product matrix for the first two column vectors of X using R's command

cp <- tcrossprod(X[,1], X[,2])

The result, cp, is now multiplied with the matrix Y and all products are summed up:

res <- sum(cp * Y, na.rm=T)

Now I'm looking for a fast and thus efficient way to perform this calculation in R for all combinations of column vectors of the matrix X. The results should be saved in a third matrix of same dimensions as X and Y, the matrix Z, at Z[i,j] for the i-th and j-th columns of X.

I already did this job with two stacked for loops:

Z <- matrix(nrow=27, ncol=27)
for (i in 1:ncol(X)) {
 for (j in 1:ncol(X)) {
  cp     <- tcrossprod(X[,i], X[,j])
  Z[i,j] <- sum(cp * Y)
 }
}

However, it isn't as fast as I want it to be.

Thus, I'd be very grateful if you could help me to find a solution which is faster than my stacked for loop solution.

Many thanks in advance!

PS: I have stored 13 matrices X in a list. The calculations should be performed for all of these matrices. However, I assume that once we find an efficient way for the calculation with 1 matrix, I could use this way together with lapply to do the whole operations on the complete list?!


Solution

  • Each element Z[i,j] can be written as a bilinear form. The rest is: puttig together all similar calculations for the matrix Z.
    You can do:

    Z <- t(X) %*% Y %*% X  ### or
    Z <- crossprod(X, Y) %*% X
    

    To compare this calculation with your code:

    set.seed(42)
    n <- 27
    X <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)
    Y <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)
    
    Z <- matrix(nrow=n, ncol=n)
    for (i in 1:ncol(X)) {
      for (j in 1:ncol(X)) {
        cp     <- tcrossprod(X[,i], X[,j])
        Z[i,j] <- sum(cp * Y)
      }
    }
    
    Z2 <- t(X) %*% Y %*% X
    Z3 <- crossprod(X, Y) %*% X
    sum(abs(Z2-Z))
    sum(abs(Z3-Z))
    

    If L is the list of your 13 matrices X. you can do:

    lapply(L, function(X) crossprod(X, Y) %*% X)
    

    Here is the benchmarking:

    Z1 <- function(X) {
      Z <- matrix(nrow=27, ncol=27)
      for (i in 1:ncol(X)) {
        for (j in 1:ncol(X)) {
          cp     <- tcrossprod(X[,i], X[,j])
          Z[i,j] <- sum(cp * Y)
        }
      }
      return(Z)
    }
    
    library("microbenchmark")
    
    microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
    #> microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
    #Unit: microseconds
    # expr      min        lq       mean    median       uq      max neval cld
    #   Z1 3563.167 3671.6355 4391.00888 3721.3380 3874.617 9423.808   100   b
    #   Z2   26.558   27.3420   34.31214   35.5865   39.815   56.426   100  a 
    #   Z3   24.779   25.1675   27.43546   26.0965   28.034   47.268   100  a 
    

    The solutions from Ronak are not faster than the original code, i.e. they are loop-hiding:

    fun <- function(x, y) sum(tcrossprod(X[,x], X[,y]) *Y)
    
    microbenchmark(Z1=Z1(X), 
                   R1=outer(seq_len(ncol(X)), seq_len(ncol(X)), Vectorize(fun)), 
                   R2=t(sapply(seq_len(ncol(X)), function(x) 
                     sapply(seq_len(ncol(X)), function(y)  sum(tcrossprod(X[,x], X[,y]) *Y)))),
                   R3=t(apply(X, 2, function(x) apply(X, 2, function(y) sum(tcrossprod(x, y) *Y)))),
                   unit="relative")
    # Unit: relative
    # expr      min       lq     mean   median       uq       max neval cld
    #   Z1 1.000000 1.000000 1.000000 1.000000 1.000000  1.000000   100  a 
    #   R1 1.207583 1.213846 1.195597 1.216147 1.223139  1.060187   100  ab
    #   R2 1.225521 1.230332 1.487811 1.230852 1.299253 13.140022   100   b
    #   R3 1.156546 1.158774 1.217766 1.160142 2.012623  1.098679   100  ab