Search code examples
pythonnumpystatisticscovarianceeigenvector

Principle axes of covariance matrix does not line up with its angle


I'm trying to get the major axis of a covariance (gradient and intercept). I'm using the sorted eigenvectors to calculate the angle of the ellipse but when I plot the resulting ellipse against the major axis, they don't line up.

Does anyone have any ideas?

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

def eigsorted(cov):
    vals, vecs = np.linalg.eigh(cov)
    order = vals.argsort()[::-1]
    return vals[order], vecs[:,order]

def orientation_from_covariance(cov, sigma):
    vals, vecs = eigsorted(cov)
    theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))
    w, h = 2 * sigma * np.sqrt(vals)
    return w, h, theta

def plot_ellipse(ax, mu, covariance, color, linewidth=2, alpha=0.5):
    x, y, angle = orientation_from_covariance(covariance, 2)
    e = Ellipse(mu, x, y, angle=angle)
    e.set_alpha(alpha)
    e.set_linewidth(linewidth)
    e.set_edgecolor(color)
    e.set_facecolor(color)
    e.set_fill(False)
    ax.add_artist(e)
    return e


from statsmodels.stats.moment_helpers import corr2cov
corr = np.eye(2)
corr[0, 1] = corr[1, 0] = 0.7
cov = corr2cov(corr, [1, 5])
mu = [1, 1]

vectors = eigsorted(cov)[1].T
gradients = [v[0] / v[1] for v in vectors]
intercepts = [mu[1] - (gradient*mu[0]) for gradient in gradients]

plt.scatter(*np.random.multivariate_normal(mu, cov, size=9000).T, s=1);
plot_ellipse(plt.gca(), mu, cov, 'k')
_x = np.linspace(*plt.xlim())
for i,g in zip(intercepts, gradients):
    plt.plot(_x, i + (_x * g), 'k');

covariance


Solution

  • The problem was the following line

    # gradients = [v[0] / v[1] for v in vectors] # wrong
    gradients = [v[1] / v[0] for v in vectors] # correct
    

    because the gradient is the change in y over the change in x. The figure then looks like this. Ellipse Plot

    I also added plt.figure() before the plotting begins and plt.axis("equal") after the plot_ellipse call.

    I would also like to cite the numpy.linalg.eigh documentation:

    w : (…, M) ndarray

    The eigenvalues in ascending order, each repeated according to its multiplicity.

    and thus the eigsorted function could be left out.