I'm trying to make sense out of reading the result of a covariance matrix. I know if the resulting signs are both >0 then it means the arrays are moving in the same direction.
x = np.array([[10,39,19,23,28],
[43,13,32,21,20],
[15,16,22,85,15]])
print(np.cov(x))
How to interpret this result?
[[ 115.7 -120.55 -18.6 ]
[-120.55 138.7 -76.35]
[ -18.6 -76.35 933.3 ]]
Edit: in addition to Luca's answer, I've added a simple line graph to help visualise the spread (variance) and movement (covariance) of data.
A covariance matrix is a nxn symmetric matrix where n is the number of columns of the matrix you are starting with and shows how the vector variables covariate, meaning how they tend to move in respect to one another.
On the main diagonal you find the variance of the vector and on all other coordinates you find the covariance since var(X) = cov(X, X).
In the main diagonal no values can be negative since they represent the variance of a vector. On any other position, the covariance can be obtained as a product of two standard deviations (that are always non-negative) (s(X) and s(Y)) and the Pearson correlation coefficient p that instead varies between [-1, 1]: this is the coefficient that makes the values positive or negative.
cov(X, Y) = p(X,Y)s(X)s(Y)
There are three possibilities:
The standard deviations effect on the coefficients in the matrices is "just" magnitude, meaning they highlight more correlation when the standard deviation of the data points is higher.
To better visualise the content of the matrix I am using the heatmap
function from the seaborn
python package. Also I
have added the correlation matrix to better compare the results.
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
x = np.array([[10,39,19,23,28],
[43,13,32,21,20],
[15,16,22,85,15]])
plt.rcParams['figure.figsize'] = [10, 5]
plt.axis('scaled')
plt.subplot(1,2,1)
sns.heatmap(np.cov(x),
annot=True,
cbar = False,
fmt="0.2f",
cmap="YlGnBu",
xticklabels=range(len(x)),
yticklabels=range(len(x)))
plt.title("Covariance matrix")
plt.subplot(1,2,2)
sns.heatmap(np.corrcoef(x),
annot=True,
cbar = False,
fmt="0.2f",
cmap="YlGnBu",
xticklabels=range(len(x)),
yticklabels=range(len(x)))
plt.title("Correlation matrix")
Output:
The third vector, when compared with the others, has an exceptionally high variance. All the vectors have a negative correlations, in particular the vector 1 and 2 that are strongly correlated. The vectors 1 and 3 are the least correlated.