I would like to implement covariance and correlation matrix without using the inbuilt function.
My codes:
u1 = 1; u2 = 0; sigma1 = 1; sigma2 = 2; N = 1000
X = norm.rvs(u1 , sigma1,size=(1 , N))
Y= norm.rvs(u2 , sigma2,size=(1 , N))
XY = np.concatenate((X,Y))
fact = N - 1
cov_mat = np.dot(XY.T, XY.conj()) / fact
print(cov_mat)
The Results
[[ 0.000136 -0.00045308 0.00041102 ... -0.00066916 -0.00048639
-0.00053653]
[-0.00045308 0.00686947 0.00272365 ... 0.00294479 0.00971417
0.00538347]
[ 0.00041102 0.00272365 0.0043675 ... -0.00147591 0.00471047
0.00112446]
...
[-0.00066916 0.00294479 -0.00147591 ... 0.00338792 0.00347361
0.0031199 ]
[-0.00048639 0.00971417 0.00471047 ... 0.00347361 0.01396131
0.00734893]
[-0.00053653 0.00538347 0.00112446 ... 0.0031199 0.00734893
0.00452921]]
The results are not as I expect. kindly assist.
Using np.cov(), the result is:
[[0.98423898 0.01737643]
[0.01737643 3.8532223 ]]
Thank you.
The covariance matrix between X
and Y
looks like this:
| var_X cov_XY |
| cov_XY var_Y |
You have var_X = E[(X - E[X])²]
, similarly var_Y = E[(Y - E[Y])²]
, and cov(X, Y) = E[(X - E[X])(Y - E[Y])]
. cov(X, Y)
is symetric.
You can measure those with:
>>> var_X = ((X - X.mean())**2).mean()
>>> var_Y = ((Y - Y.mean())**2).mean()
>>> cov_XY = ((X - X.mean())*(Y - Y.mean())).mean()
The correlation matrix between X
and Y
, however looks like:
| 1 r_YX |
| r_XY 1 |
Where r_XY = cov_XY / (std_X*std_Y)
, this implies symetry for corr(X, Y)
Using cov_XY
, var_X
, and var_Y
, the correletion matrix can be constructed from:
>>> corr_XY = cov_XY / np.sqrt(var_X*var_Y)