This is a problem arose in my Markov Chain Monte Carlo (MCMC) algorithm. And I feel this problem is quite commonly encountered especially in hierarchical Gaussian models. Hence it would be great if there is a much more efficient solution. So the problem is like this:
I have many positive integer vectors xi
, for i
from 1 to n, a p.s.d. matrix A
and a p.s.d. matrix B
. For every xi
I want to compute the following Cholesky factorization:
chol( kron( diagmat( xi ), A ) + B )
So kron( diagmat( xi ), A ) + B
is the covariance matrix for a multivariate Gaussian and I want to sample from this Gaussian hence need the Cholesky factorization of it. The dimension for A
and B
are not small and I have a large n
hence computing the above Cholesky factorization for all xi
is really time-consuming. Below is the Rcpp
function I wrote using RcppArmadillo
:
#include <cmath>
#include <Rmath.h>
#include "RcppArmadillo.h"
// [[Rcpp::depends(RcppArmadillo)]]
using namespace arma;
using namespace Rcpp;
// [[Rcpp::export]]
mat test_C(mat A, mat B, mat X){
int n_x = X.n_cols;
int d_B = B.n_rows;
mat sample(d_B, n_x);
mat mS_chol_inv;
for (int i = 0; i < n_x; i++){
mS_chol_inv = inv(trimatu( chol(kron(diagmat( X.col(i) ),A) + B) ));
sample.col(i) = mS_chol_inv*randn(d_B);
}
return(sample);
}
I also test the computational efficiency using the following code comparing it to its R counterpart:
test_R <- function(A,B,X){
n_x <- ncol(X)
d_B <- ncol(B)
res <- sapply(1:n_x, function(x){
mS_chol <- chol( kronecker( diag(X[,x]),A ) + B )
return( mS_chol%*%as.matrix( rnorm(d_B) ) )
})
return(res)
}
# Simulate Data
R1 <- matrix(rnorm(24*2),24,2)
A <- R1%*%t(R1) + 0.1*diag(24)
R2 <- matrix(rnorm(264*2),264,2)
B <- R2%*%t(R2) + 0.1*diag(264)
X <- matrix(rpois(11*2178, 5),11,2178)
res <- benchmark(res_R <- test_R(A, B, X),
res_C <- test_C(A, B, X),
columns=c("test", "replications", "elapsed", "relative"),
order="relative",
replications = 2)
And the result is as follows
> print(res)
test replications elapsed relative
1 res_R <- test_R(A, B, X) 2 18.920 1.000
2 res_C <- test_C(A, B, X) 2 20.724 1.095
As can be seen, a single run is approximately 10 second, and this is simply not feasible in a MCMC algorithm. Also, since the chol()
dominates the computational complexity, the improvement of using Rcpp
over pure R
is trivial. But maybe I did not write the most efficient code? So any advice?
Since the matrix inside chol()
is very structured and the only thing that is varying is xi
, maybe there is some algebra trick that I do not know that can solve this efficiently? I have posted this as a linear-algebra question under Mathematics and here is the link. Unfortunately so far I have not received any solution, people do point out that this is embarrassingly parallel.
Any advice on code/algebra will be helpful! Thanks ahead for your time.
You may try to use Microsoft R from here, artist formerly known as RRO. It integrates with multi-threaded Intel MKL library (same place to find and install it) and on Intel hardware matrix operations are quite fast.
Disclaimer: YMMV