Search code examples
numpylinear-algebrapcamatrix-factorization

Eigen-values of covariance matrix usign QR factorization


Given the matrix X of dimension D x N, I am interested to compute the eigen-values of C = np.dot(X, X.T)/N using QR factorization. Based on following:

enter image description here

we expect the eigen-values of C to be np.diag(r.T,r) using the following

q, r=np.linalg.qr(np.dot(X.T, V))
lambdas2=np.diag(np.dot(r.T, r)) / N

However, the values in lambdas2 I am taking using the code below are different from the ones in lambda1.

from sklearn.decomposition import PCA
pca = PCA()
pca.fit(X)
lambdas1=pca.explained_variance_

The full example is:

import numpy as np
from sklearn.decomposition import PCA
if __name__ == "__main__":
    N = 1000
    D = 20
    X = np.random.rand(D, N)

    X_train_mean = X.mean(axis=0)
    X_train_std = X.std(axis=0)
    X_normalized = (X - X_train_mean) / X_train_std

    pca = PCA(n_components=D)
    cov_ = np.cov(X_normalized) # A D x D array.
    pca.fit(cov_)
    lambdas1 = pca.explained_variance_

    projected_data = np.dot(pca.components_, X_normalized).T # An N x n_components array.

    q, r = np.linalg.qr(projected_data)
    lambdas2 = np.sort(np.diag(np.dot(r.T, r)) / N)[::-1]

Solution

  • I guess that you need to pass X_normalized.T to the fit method of PCA and not the covariance matrix.

    Because the computation of the covariance matrix is part of PCA algorithm and the components/explained_variance are directly the eigenvectors/eigenvalues of the covariance matrix.