I am trying to write a code to calculate the following two equations:
So far, I have the following code:
set.seed(123)
a = rnorm(100, 0, 0.02)
b = rnorm(100, 0, 0.03)
c = rnorm(100, 0, 0.04)
dt = cbind(a, b, c)
wg = runif(ncol(dt))
wg = wg/sum(wg)
wg = round(wg, digits = 3)
options(scipen = 999)
cv = cov(dt)
VARs = diag(cv)
cr = cor(dt)
length(wg)
sum_1 = 0
for (i in 1 : length(wg) ) {
for (j in min(i+1, length(wg)) ) {
sum_1 = sum_1 + 2 * wg[i] * wg[j] * cr[i, j]
}}
rho_1 = sum_1/(1 - sum(wg^2))
sum_2 = 0
for (i in 1 : length(wg) ) {
for (j in min(i+1, length(wg)) ) {
sum_2 = sum_2 + 2 * wg[i] * wg[j] * sqrt(VARs[i]) * sqrt(VARs[j])
}}
PV = t(wg) %*% cv %*% wg
ss = sum(wg^2 * VARs)
rho_2 = (PV - ss)/sum_2
rhos = c(rho_1, rho_2)
rhos
[1] 1.388473356 -0.004104948
I expected these two to be close to each other. I think there might be an error in my code. I would appreciate if somebody could verify the code.
Thanks.
Thanks to 李哲源 Zheyuan Li's code, I simulated 100 observations for these two correlation measures and they indeed seem to be close to each other:
Vectorization power for the troublesome summations.
The first equation:
# double summation in numerator (including multiplier 2)
diag(cr) <- 0
double_sum.1 <- c(crossprod(wg, cr %*% wg))
# single summation in denominator
single_sum.1 <- c(crossprod(wg))
# result
double_sum.1 / (1 - single_sum.1)
The second equation:
dcv <- sqrt(diag(cv))
# single summation in numerator
single_sum.2 <- c(crossprod(wg * dcv))
# double summation in denominator (including multiplier 2)
double_sum.2 <- sum(tcrossprod(wg * dcv)) - single_sum.2
# result
PV <- c(crossprod(wg, cv %*% wg))
(PV - single_sum.2) / double_sum.2
Fix to original loop:
i in 1:(length(wg)-1)
j in (i+1):length(wg)
It does not seem to me that these two equations should give close result.
To see this, we now use true covariance matrix and true correlation matrix in your example. As you generated 3 sets of independent normal samples, so they would have diagonal covariance matrix and identity correlation matrix:
dcv <- c(0.02, 0.03, 0.04)
cr <- diag(3)
This immediately implies that the first equation would give 0. But the second, is (do a quick paper work)
sum_i (1 - w_i) ^ 2 * sigma_i^2
How can this be zero?