Search code examples
pythonscikit-learnpcagpflow

PCA computed by GPflow and Sklearn doesn't match


I am performing PCA analysis by using Sklearn and GPflow. I noticed that the output returned by both the libraries doesn't match.

Please see below the sample code snippet-

import numpy as np
from gpflow.models import PCA_reduce
from sklearn.decomposition import PCA

X = np.random.random((100, 10))

for n in range(1, 6):
    X1 = PCA(n_components=n).fit_transform(X)
    X2 = PCA_reduce(X, n)
    print('[n=%d] allclose=%s' % (n, np.allclose(X1, X2)))

Below is the output-

[n=1] allclose=True
[n=2] allclose=False
[n=3] allclose=False
[n=4] allclose=False
[n=5] allclose=False

It matches only when the number of principal components is 1. Why such behavior?


Solution

  • Two different issues are conspiring here:

    1. The order of the eigenvalues are reversed for the two methods. In the sklearn implementation, eigenvectors are ordered by decreasing magnitude of their eigenvalue, while in the gpflow implementation, they're ordered by increasing magnitude. In particular, you should be comparing PCA(n).fit_transform(X) to PCA_reduce(X, n)[:, ::-1]. This of course also explains why you get what you expect when using only a single component.

    2. However, this in itself won't suffice: If $v$ is a length 1 eigenvector with a given eigenvalue, then $-v$ is as well, so you can not simply use np.allclose to determine if the results are coherent; you need to take into account the potential reversal. So, instead what you could use would be something like a = np.all(np.isclose(X1, X2), 0) to compare the vectors directly, b = np.all(np.isclose(X1, -X2), 0) (noting the minus) to compare them when all vectors in X2 are reversed, since then, a | b becomes the condition that they agree up to reversal. Then finally, np.all(a | b) will check that this holds true for each eigenvector.

    Indeed, the following modification to your test spits out all trues:

    In [74]: for n in range(1, 6):
        ...:     X1 = PCA(n_components=n).fit_transform(X)
        ...:     X2 = PCA_reduce(X, n)[:, ::-1]
        ...:     print('[n=%d] allclose=%s' % (n, np.all(np.all(np.isclose(X1, X2), 0) | np.all(np.isclose(X1, -X2), 0))))
    
    [n=1] allclose=True
    [n=2] allclose=True
    [n=3] allclose=True
    [n=4] allclose=True
    [n=5] allclose=True