Search code examples
ralgorithmmatrixcorrelation

Complete a positive definite correlation matrix


I want to complete a correlation matrix so that it becomes positive definite, where I only care about the correlations of each variable with the first variable, and the rest can be anything. So for example, I'd like to complete this correlation matrix, filling in the --'s to any set of numbers so that the matrix becomes positive definite.

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,] 1.00 0.76 0.77 0.78 0.79 0.81 0.82 0.83 0.84
 [2,] 0.76 1.00  --   --   --   --   --   --   -- 
 [3,] 0.77  --  1.00  --   --   --   --   --   -- 
 [4,] 0.78  --   --  1.00  --   --   --   --   -- 
 [5,] 0.79  --   --   --  1.00  --   --   --   -- 
 [6,] 0.81  --   --   --   --  1.00  --   --   -- 
 [7,] 0.82  --   --   --   --   --  1.00  --   -- 
 [8,] 0.83  --   --   --   --   --   --  1.00  -- 
 [9,] 0.84  --   --   --   --   --   --   --  1.00

There's functions like nearcor for "fixing" positive definite correlation matrices, but they optimize for all the variables, instead of filling in some, while respecting others. Or there's functions like pos_def_limits for while-looping to fill in a single next correlation value, but it seems intractable to do this for many values at the same time.

(The intended usage of this correlation matrix, is to give it to rnorm_multi to generate data satisfying it.)


Solution

  • I guess you can start constructing a positive definite matrix M from a lower-triangular matrix L with positive diagonal entries, e.g., M = L*L^T, such that x^T*M*x = x^T*L*L^T*x = |L^T*x|^2 should be definitely positive if x>0.

    An implementation for example:

    n <- 9
    L <- diag(n)
    L[-1, 1] <- c(76:79, 81:84) / 100
    v <- 1 - L[, 1]^2
    for (k in 2:length(v)) {
      L[k, 2:k] <- sqrt((r <- runif(k - 1)) / sum(r) * v[k])
    }
    M <- tcrossprod(L)
    

    and we can see

    > M
          [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
     [1,] 1.00 0.7600000 0.7700000 0.7800000 0.7900000 0.8100000 0.8200000
     [2,] 0.76 1.0000000 0.8883572 0.7988221 0.7328434 0.7003986 0.7966054
     [3,] 0.77 0.8883572 1.0000000 0.8972778 0.7937081 0.8500339 0.8200438
     [4,] 0.78 0.7988221 0.8972778 1.0000000 0.9093460 0.8651427 0.8668928
     [5,] 0.79 0.7328434 0.7937081 0.9093460 1.0000000 0.8581035 0.9336535
     [6,] 0.81 0.7003986 0.8500339 0.8651427 0.8581035 1.0000000 0.9105348
     [7,] 0.82 0.7966054 0.8200438 0.8668928 0.9336535 0.9105348 1.0000000
     [8,] 0.83 0.6964236 0.7789939 0.7994304 0.8621129 0.9136291 0.9507678
     [9,] 0.84 0.7344271 0.8091267 0.7860827 0.8387232 0.9097534 0.9200662
               [,8]      [,9]
     [1,] 0.8300000 0.8400000
     [2,] 0.6964236 0.7344271
     [3,] 0.7789939 0.8091267
     [4,] 0.7994304 0.7860827
     [5,] 0.8621129 0.8387232
     [6,] 0.9136291 0.9097534
     [7,] 0.9507678 0.9200662
     [8,] 1.0000000 0.9652907
     [9,] 0.9652907 1.0000000
    

    and we can check that all eigen values are greater than 0

    > all(eigen(M)$values > 0)
    [1] TRUE