Search code examples
python-3.xnumpycovariance-matrix

interpretation of covariance result matrix


I'm trying to make sense out of reading the result of a covariance matrix. I know if the resulting signs are both >0 then it means the arrays are moving in the same direction.

x = np.array([[10,39,19,23,28],
              [43,13,32,21,20],
              [15,16,22,85,15]])

print(np.cov(x))

How to interpret this result?

[[ 115.7  -120.55  -18.6 ]
 [-120.55  138.7   -76.35]
 [ -18.6   -76.35  933.3 ]]

Edit: in addition to Luca's answer, I've added a simple line graph to help visualise the spread (variance) and movement (covariance) of data.

enter image description here


Solution

  • Covariance Matrix

    A covariance matrix is a nxn symmetric matrix where n is the number of columns of the matrix you are starting with and shows how the vector variables covariate, meaning how they tend to move in respect to one another.

    Components

    On the main diagonal you find the variance of the vector and on all other coordinates you find the covariance since var(X) = cov(X, X).

    Positive and negative coefficients

    In the main diagonal no values can be negative since they represent the variance of a vector. On any other position, the covariance can be obtained as a product of two standard deviations (that are always non-negative) (s(X) and s(Y)) and the Pearson correlation coefficient p that instead varies between [-1, 1]: this is the coefficient that makes the values positive or negative.

    cov(X, Y) = p(X,Y)s(X)s(Y)

    There are three possibilities:

    1. p(X, Y)==0: no correlation between the vectors.
    2. p(X,Y)>0: positive correlation, meaning that when the vector X grows so does the magnitude of Y.
    3. p(X,Y)<0: negative correlation, meaning that when the vector X grows, the magnitude of Y decreases.

    The standard deviations effect on the coefficients in the matrices is "just" magnitude, meaning they highlight more correlation when the standard deviation of the data points is higher.

    Visualization

    To better visualise the content of the matrix I am using the heatmap function from the seaborn python package. Also I have added the correlation matrix to better compare the results.

    import numpy as np
    from matplotlib import pyplot as plt
    import seaborn as sns
    
    x = np.array([[10,39,19,23,28],
                [43,13,32,21,20],
                [15,16,22,85,15]])
    
    plt.rcParams['figure.figsize'] = [10, 5]
    plt.axis('scaled')
    plt.subplot(1,2,1)
    sns.heatmap(np.cov(x), 
            annot=True,
            cbar = False,
            fmt="0.2f",
            cmap="YlGnBu",
            xticklabels=range(len(x)),
            yticklabels=range(len(x)))
    plt.title("Covariance matrix")
    plt.subplot(1,2,2)
    sns.heatmap(np.corrcoef(x), 
            annot=True,
            cbar = False,
            fmt="0.2f",
            cmap="YlGnBu",
            xticklabels=range(len(x)),
            yticklabels=range(len(x)))
    plt.title("Correlation matrix")
    

    Output:

    Correlation and covariance matrices

    Interpretation

    The third vector, when compared with the others, has an exceptionally high variance. All the vectors have a negative correlations, in particular the vector 1 and 2 that are strongly correlated. The vectors 1 and 3 are the least correlated.