Search code examples
pythonnumpycovariance

Implementing Correlation and Correlation matrix in Python


I would like to implement covariance and correlation matrix without using the inbuilt function.

My codes:

u1 = 1; u2 = 0; sigma1 = 1; sigma2 = 2; N = 1000

X = norm.rvs(u1 , sigma1,size=(1 , N))
Y= norm.rvs(u2 , sigma2,size=(1 , N))
XY =  np.concatenate((X,Y))
fact = N - 1 
cov_mat = np.dot(XY.T, XY.conj()) / fact
print(cov_mat)

The Results

[[ 0.000136   -0.00045308  0.00041102 ... -0.00066916 -0.00048639
  -0.00053653]
 [-0.00045308  0.00686947  0.00272365 ...  0.00294479  0.00971417
   0.00538347]
 [ 0.00041102  0.00272365  0.0043675  ... -0.00147591  0.00471047
   0.00112446]
 ...
 [-0.00066916  0.00294479 -0.00147591 ...  0.00338792  0.00347361
   0.0031199 ]
 [-0.00048639  0.00971417  0.00471047 ...  0.00347361  0.01396131
   0.00734893]
 [-0.00053653  0.00538347  0.00112446 ...  0.0031199   0.00734893
   0.00452921]]

The results are not as I expect. kindly assist.

Using np.cov(), the result is:

[[0.98423898 0.01737643]
 [0.01737643 3.8532223 ]]

Thank you.


Solution

  • The covariance matrix between X and Y looks like this:

    |  var_X  cov_XY |
    | cov_XY   var_Y |
    

    You have var_X = E[(X - E[X])²], similarly var_Y = E[(Y - E[Y])²], and cov(X, Y) = E[(X - E[X])(Y - E[Y])]. cov(X, Y) is symetric.

    You can measure those with:

    >>> var_X = ((X - X.mean())**2).mean()
    
    >>> var_Y = ((Y - Y.mean())**2).mean()
    
    >>> cov_XY = ((X - X.mean())*(Y - Y.mean())).mean()
    

    The correlation matrix between X and Y, however looks like:

    |   1     r_YX |
    |  r_XY      1 |
    

    Where r_XY = cov_XY / (std_X*std_Y), this implies symetry for corr(X, Y)

    Using cov_XY, var_X, and var_Y, the correletion matrix can be constructed from:

    >>> corr_XY = cov_XY / np.sqrt(var_X*var_Y)