Search code examples
rcalculus

Weird output from double summation code


I am trying to write a code to calculate the following two equations:

enter image description here

So far, I have the following code:

set.seed(123)
a = rnorm(100, 0, 0.02)
b = rnorm(100, 0, 0.03)
c = rnorm(100, 0, 0.04)
dt = cbind(a, b, c)
wg = runif(ncol(dt))
wg = wg/sum(wg)
wg = round(wg, digits = 3)

options(scipen = 999)
cv = cov(dt)
VARs = diag(cv)
cr = cor(dt)
length(wg)

sum_1 = 0
for (i in 1 : length(wg) ) {
  for (j in min(i+1, length(wg)) ) {
    sum_1 = sum_1 + 2 * wg[i] * wg[j] * cr[i, j]
  }}
rho_1 = sum_1/(1 - sum(wg^2))

sum_2 = 0
for (i in 1 : length(wg) ) {
  for (j in min(i+1, length(wg)) ) {
    sum_2 = sum_2 + 2 * wg[i] * wg[j] * sqrt(VARs[i]) * sqrt(VARs[j])
  }}

PV = t(wg) %*% cv %*% wg
ss = sum(wg^2 * VARs)
rho_2 = (PV - ss)/sum_2

rhos = c(rho_1, rho_2)
rhos

[1]  1.388473356 -0.004104948

I expected these two to be close to each other. I think there might be an error in my code. I would appreciate if somebody could verify the code.

Thanks.

Thanks to 李哲源 Zheyuan Li's code, I simulated 100 observations for these two correlation measures and they indeed seem to be close to each other:

enter image description here


Solution

  • Vectorization power for the troublesome summations.

    The first equation:

    # double summation in numerator (including multiplier 2)
    diag(cr) <- 0
    double_sum.1 <- c(crossprod(wg, cr %*% wg))
    
    # single summation in denominator
    single_sum.1 <- c(crossprod(wg))
    
    # result
    double_sum.1 / (1 - single_sum.1)
    

    The second equation:

    dcv <- sqrt(diag(cv))
    
    # single summation in numerator
    single_sum.2 <- c(crossprod(wg * dcv))
    
    # double summation in denominator (including multiplier 2)
    double_sum.2 <- sum(tcrossprod(wg * dcv)) - single_sum.2
    
    # result
    PV <- c(crossprod(wg, cv %*% wg))
    (PV - single_sum.2) / double_sum.2
    

    Fix to original loop:

    i in 1:(length(wg)-1)
    j in (i+1):length(wg)
    

    Remark

    It does not seem to me that these two equations should give close result.

    To see this, we now use true covariance matrix and true correlation matrix in your example. As you generated 3 sets of independent normal samples, so they would have diagonal covariance matrix and identity correlation matrix:

    dcv <- c(0.02, 0.03, 0.04)
    cr <- diag(3)
    

    This immediately implies that the first equation would give 0. But the second, is (do a quick paper work)

    sum_i (1 - w_i) ^ 2 * sigma_i^2
    

    How can this be zero?