Recalling that C = [email protected]
using Cholesky factorization,
I'm trying to repeat transformations described in this article with Python
, but am having some misunderstanding & inability to return initially given Covariance matrix from generated dataset given that covariance matrix. The question is how to get covariance matrix back?
# Importing libraries
import numpy as np
import pandas as pd
from scipy import linalg as la
import matplotlib.pyplot as plt
sigma = [[9,1], [1,1]] # given Sigma (or covariance matrix)
U= la.cholesky(sigma, lower=False)
rec= U.T@U
print(pd.DataFrame({'U': U[:,1], 'rec1':rec[0], 'rec2':rec[1]} ))
# generate x,y ~ N(0,1), corr(x,y)=0
xy = U.dot(np.random.normal(0,1,(2,1000)))
##print(xy)
L= U.T
zw = L @ xy # Z and W variables - correlated
cov = np.cov(zw, ddof=0) # ?????????? /* did we succeed? Compute covariance of transformed data */
print(cov)
# [[86.66333557 10.85423541]
# [10.85423541 2.1026003 ]]
plt.scatter(zw[0], zw[1])
plt.show()
P.S. used ddof=0
as described here
Well, the problem in the article still seems to be in U= la.cholesky(sigma, lower=False)
. When we take L
as Lower Triangular matrix instead, then example becomes OK
# Importing libraries
import numpy as np
import pandas as pd
from scipy import linalg as la
import matplotlib.pyplot as plt
sigma = np.array([[9,1], [1,1]])
L= la.cholesky(sigma, lower=True)
print(L)
rec= np.dot(L, L.T.conj()) # L.T@L
print(rec)
print(pd.DataFrame({'L': L[1, :], 'rec1':rec[0], 'rec2':rec[1]} ))
## 0.333333 9.0 1.0
## 0.942809 1.0 1.0
print(np.allclose( rec, sigma, rtol= 1.4e-01, atol= 1.4e-01))
# generate x,y ~ N(0,1), corr(x,y)=0
xy = np.random.normal(0,1,(2,10000))
zw = L @ xy # Z and W variables - correlated
cov = np.cov(zw, ddof=0) # /* did we succeed? Compute covariance of transformed data */
print(cov)
##[[9.22566851 1.02844567]
## [1.02844567 0.97824299]]
plt.scatter(zw[0], zw[1])
plt.show()