First, let's produce some dummy data for reference:
X <- matrix(runif(27*27, 0, 1), nrow=27, ncol=27)
Y <- matrix(runif(27*27, 0, 1), nrow=27, ncol=27)
I have 2 matrices, X and Y. First, I'm going to calculate the cross product matrix for the first two column vectors of X using R's command
cp <- tcrossprod(X[,1], X[,2])
The result, cp
, is now multiplied with the matrix Y and all products are summed up:
res <- sum(cp * Y, na.rm=T)
Now I'm looking for a fast and thus efficient way to perform this calculation in R for all combinations of column vectors of the matrix X. The results should be saved in a third matrix of same dimensions as X and Y, the matrix Z, at Z[i,j] for the i-th and j-th columns of X.
I already did this job with two stacked for loops:
Z <- matrix(nrow=27, ncol=27)
for (i in 1:ncol(X)) {
for (j in 1:ncol(X)) {
cp <- tcrossprod(X[,i], X[,j])
Z[i,j] <- sum(cp * Y)
}
}
However, it isn't as fast as I want it to be.
Thus, I'd be very grateful if you could help me to find a solution which is faster than my stacked for loop solution.
Many thanks in advance!
PS: I have stored 13 matrices X in a list. The calculations should be performed for all of these matrices. However, I assume that once we find an efficient way for the calculation with 1 matrix, I could use this way together with lapply to do the whole operations on the complete list?!
Each element Z[i,j]
can be written as a bilinear form. The rest is: puttig together all similar calculations for the matrix Z
.
You can do:
Z <- t(X) %*% Y %*% X ### or
Z <- crossprod(X, Y) %*% X
To compare this calculation with your code:
set.seed(42)
n <- 27
X <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)
Y <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)
Z <- matrix(nrow=n, ncol=n)
for (i in 1:ncol(X)) {
for (j in 1:ncol(X)) {
cp <- tcrossprod(X[,i], X[,j])
Z[i,j] <- sum(cp * Y)
}
}
Z2 <- t(X) %*% Y %*% X
Z3 <- crossprod(X, Y) %*% X
sum(abs(Z2-Z))
sum(abs(Z3-Z))
If L
is the list of your 13 matrices X. you can do:
lapply(L, function(X) crossprod(X, Y) %*% X)
Here is the benchmarking:
Z1 <- function(X) {
Z <- matrix(nrow=27, ncol=27)
for (i in 1:ncol(X)) {
for (j in 1:ncol(X)) {
cp <- tcrossprod(X[,i], X[,j])
Z[i,j] <- sum(cp * Y)
}
}
return(Z)
}
library("microbenchmark")
microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#> microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# Z1 3563.167 3671.6355 4391.00888 3721.3380 3874.617 9423.808 100 b
# Z2 26.558 27.3420 34.31214 35.5865 39.815 56.426 100 a
# Z3 24.779 25.1675 27.43546 26.0965 28.034 47.268 100 a
The solutions from Ronak are not faster than the original code, i.e. they are loop-hiding:
fun <- function(x, y) sum(tcrossprod(X[,x], X[,y]) *Y)
microbenchmark(Z1=Z1(X),
R1=outer(seq_len(ncol(X)), seq_len(ncol(X)), Vectorize(fun)),
R2=t(sapply(seq_len(ncol(X)), function(x)
sapply(seq_len(ncol(X)), function(y) sum(tcrossprod(X[,x], X[,y]) *Y)))),
R3=t(apply(X, 2, function(x) apply(X, 2, function(y) sum(tcrossprod(x, y) *Y)))),
unit="relative")
# Unit: relative
# expr min lq mean median uq max neval cld
# Z1 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 a
# R1 1.207583 1.213846 1.195597 1.216147 1.223139 1.060187 100 ab
# R2 1.225521 1.230332 1.487811 1.230852 1.299253 13.140022 100 b
# R3 1.156546 1.158774 1.217766 1.160142 2.012623 1.098679 100 ab