I've been trying to reproduce a cholesky-like covariance decomposition in R - like it is done in Matlab using cholcov()
. Example taken from https://uk.mathworks.com/help/stats/cholcov.html.
Result of the original cholcov()
function as of their example:
T =
-0.2113 0.7887 -0.5774 0
0.7887 -0.2113 -0.5774 0
1.1547 1.1547 1.1547 1.7321
I am trying to replicate this T
in R. I tried:
C1 <- cbind(c(2,1,1,2), c(1,2,1,2), c(1,1,2,2), c(2,2,2,3))
T1 <- chol(C1)
C2 <- t(T1) %*% T1
My result:
[,1] [,2] [,3] [,4]
[1,] 1.414214 0.7071068 0.7071068 1.414214e+00
[2,] 0.000000 1.2247449 0.4082483 8.164966e-01
[3,] 0.000000 0.0000000 1.1547005 5.773503e-01
[4,] 0.000000 0.0000000 0.0000000 1.290478e-08
C2
recovers C1
, but T1
is quite different from MATLAB's solution. I then thought maybe it would be a Cholesky composition of the covariance matrix:
T1 <- chol(cov(C1))
but I get
[,1] [,2] [,3] [,4]
[1,] 0.5773503 0.0000000 0.0000000 2.886751e-01
[2,] 0.0000000 0.5773503 0.0000000 2.886751e-01
[3,] 0.0000000 0.0000000 0.5773503 2.886751e-01
[4,] 0.0000000 0.0000000 0.0000000 3.725290e-09
which is not right either.
Could anyone give me a hint how cholcov()
in Matlab is calculated so that I could replicate it in R?
You are essentially abusing R function chol
in this case. The cholcov
function from MATLAB is a composite function.
On the other hand, chol
from R only does Choleksy factorization. The example you give, C1
, falls into the second case. So, we should resort to eigen
function in R.
E <- eigen(C1, symmetric = TRUE)
#$values
#[1] 7.000000e+00 1.000000e+00 1.000000e+00 2.975357e-17
#
#$vectors
# [,1] [,2] [,3] [,4]
#[1,] -0.4364358 0.000000e+00 8.164966e-01 -0.3779645
#[2,] -0.4364358 -7.071068e-01 -4.082483e-01 -0.3779645
#[3,] -0.4364358 7.071068e-01 -4.082483e-01 -0.3779645
#[4,] -0.6546537 8.967707e-16 -2.410452e-16 0.7559289
V <- E$vectors
D <- sqrt(E$values) ## root eigen values
Since numerical rank is 3, we drop the last eigen value and eigen vector:
V1 <- V[, 1:3]
D1 <- D[1:3]
Thus the factor you want is:
R <- D1 * t(V1) ## diag(D1) %*% t(V1)
# [,1] [,2] [,3] [,4]
#[1,] -1.1547005 -1.1547005 -1.1547005 -1.732051e+00
#[2,] 0.0000000 -0.7071068 0.7071068 8.967707e-16
#[3,] 0.8164966 -0.4082483 -0.4082483 -2.410452e-16
We can verify that:
crossprod(R) ## t(R) %*% R
# [,1] [,2] [,3] [,4]
#[1,] 2 1 1 2
#[2,] 1 2 1 2
#[3,] 1 1 2 2
#[4,] 2 2 2 3
The R
factor above is not as same as the one returned by cholcov
due to different algorithms used for Eigen factorization. R uses LAPACK routine DSYVER
in which some pivoting is done so that eigen values are non-increasing. MATLAB's cholcov
is not open-source, so I'm not sure what algorithm it uses. But it is easy to demonstrate that it does not arrange eigen values in non-increasing order.
Consider the factor T
returned by cholcov
:
T <- structure(c(-0.2113, 0.7887, 1.1547, 0.7887, -0.2113, 1.1547,
-0.5774, -0.5774, 1.1547, 0, 0, 1.7321), .Dim = 3:4)
We can get eigen values by
rowSums(T ^ 2)
# [1] 1.000086 1.000086 7.000167
There are some round-off error because T
is not precise, but we can see clearly that eigen values are 1, 1, 7
. On the other hand, we have 7, 1, 1
from R (recall D1
).