Search code examples
pythonmatrix-factorizationcovariance-matrix

reconstruct Covariance matrix from dataset generated given that Covariance matrix (using Cholesky factorization)


Recalling that C = [email protected] using Cholesky factorization, I'm trying to repeat transformations described in this article with Python, but am having some misunderstanding & inability to return initially given Covariance matrix from generated dataset given that covariance matrix. The question is how to get covariance matrix back?

# Importing libraries
import numpy as np
import pandas as pd
from scipy import linalg as la
import matplotlib.pyplot as plt

sigma = [[9,1], [1,1]]   # given Sigma (or covariance matrix)

U= la.cholesky(sigma, lower=False)
rec= U.T@U

print(pd.DataFrame({'U': U[:,1], 'rec1':rec[0],  'rec2':rec[1]} ))

# generate x,y ~ N(0,1), corr(x,y)=0
xy = U.dot(np.random.normal(0,1,(2,1000)))
##print(xy)
L= U.T
zw = L @ xy     # Z and W variables - correlated
cov = np.cov(zw, ddof=0)    # ?????????? /* did we succeed? Compute covariance of transformed data */
print(cov)
# [[86.66333557 10.85423541]
# [10.85423541  2.1026003 ]]

plt.scatter(zw[0], zw[1])
plt.show()

P.S. used ddof=0 as described here


Solution

  • Well, the problem in the article still seems to be in U= la.cholesky(sigma, lower=False). When we take L as Lower Triangular matrix instead, then example becomes OK

    # Importing libraries
    import numpy as np
    import pandas as pd
    from scipy import linalg as la
    import matplotlib.pyplot as plt
    
    sigma = np.array([[9,1], [1,1]])
    
    L= la.cholesky(sigma, lower=True)
    print(L)
    rec= np.dot(L, L.T.conj())           # L.T@L
    print(rec)
    print(pd.DataFrame({'L': L[1, :], 'rec1':rec[0],  'rec2':rec[1]} ))    
    ##  0.333333   9.0   1.0
    ##  0.942809   1.0   1.0
    print(np.allclose( rec, sigma, rtol= 1.4e-01, atol= 1.4e-01))
    
    # generate x,y ~ N(0,1), corr(x,y)=0
    xy = np.random.normal(0,1,(2,10000))
    zw = L @ xy     # Z and W variables - correlated
    cov = np.cov(zw, ddof=0)    #  /* did we succeed? Compute covariance of transformed data */
    print(cov)
    ##[[9.22566851 1.02844567]
    ## [1.02844567 0.97824299]]
    
    plt.scatter(zw[0], zw[1])
    plt.show()