PCA matrix with sklearn

I did PCA on some data and I want to extract the PCA matrix. This is my code (excluding loading the data):

from sklearn.decomposition import PCA
pca = PCA(n_components=5)
pca_result = pca.fit_transform(recon.data.cpu().numpy())
M = pca.components_

I thought that M should be the PCA matrix. However when I print pca_result (first few rows) I get this:

[-21.08167   ,  -5.67821   ,   0.17554353,  -0.732398  ,0.04658243],
[-25.936056  ,  -6.535223  ,   0.6887493 ,  -0.8394666 ,0.06557591],
[-30.755266  ,  -6.0098953 ,   1.1643354 ,  -0.82322127,0.07585468]

But when I print np.transpose(np.matmul(M,np.transpose(recon))), I get this:

[-27.78438   ,  -2.5913327 ,   0.87771094,  -1.0819707 ,0.1037216 ],
[-32.63887   ,  -3.4483302 ,   1.3909296 ,  -1.1890743 ,0.12274324],
[-37.45802   ,  -2.9229708 ,   1.8665184 ,  -1.1728177 ,0.13301012]

What am I doing wrong and how do I get the actual PCA matrix? Thank you!

Solution

in a PCA you go from an n-dimensional space to a different (rotated) n-dimensional space. This change is done using an nxn matrix

This is indeed the matrix returned by pca.components_; when multiplied by the PCA-transformed data it gives the reconstruction of the original data X.

Here is a demonstration with the iris data:

import numpy as np
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

X = load_iris().data
mu = np.mean(X, axis=0) # mean value

pca = PCA()
X_pca = pca.fit_transform(X)
M = pca.components_
M
# result:
array([[ 0.36138659, -0.08452251,  0.85667061,  0.3582892 ],
       [ 0.65658877,  0.73016143, -0.17337266, -0.07548102],
       [-0.58202985,  0.59791083,  0.07623608,  0.54583143],
       [-0.31548719,  0.3197231 ,  0.47983899, -0.75365743]])

i.e. a 4x4 matrix indeed (the iris data have 4 features).

Let's reconstruct the original data using all PCs:

X_hat = np.matmul(X_pca, M)
X_hat = X_hat + mu # add back the mean
print(X_hat[0]) # reconstructed
print(X_[0])    # original

Result:

[5.1 3.5 1.4 0.2]
[5.1 3.5 1.4 0.2]

i.e. perfect reconstruction.

Reconstructing with fewer PCs, let's say 2 (out of 4):

n_comp = 2
X_hat2 = np.matmul(X_pca[:,:n_comp], pca.components_[:n_comp,:])
X_hat2 = X_hat2 + mu
print(X_hat2[0])

Result:

[5.08303897 3.51741393 1.40321372 0.21353169]

i.e. a less accurate reconstruction, as we should expect due to the truncation in used PCs (2 instead of all 4).

(Code adapted from the great thread How to reverse PCA and reconstruct original variables from several principal components? at Cross Validated.)