Search code examples
pythonscikit-learnpca

Projecting new samples into existing PCA space?


I have a dataset and I have performed PCA analysis using scikit-learn. I have another dataset with the same features and would like to project the data into the same PCA space as created by the first dataset.

My understanding is that I have to transform and center the data in the same way the original dataset was and then use the eigenvectors to rotate the data.

I'm a little stuck as to do this based on the output from the sklearn.decomposition.PCA library.

So far I have

X1 = np.loadtxt(fname="dataset1.txt")
pca = PCA(n_components=50)
pca.fit_transform(X1)
pca_result = pca.transform(X1)

X2 = np.loadtxt(fname="dataset2.txt")

Does anyone have any pointers on how this can be achieved?


Solution

  • You have some redundancy there. If you perform fit_transform(), it returns the principal components while also saving the parameters to the object. If you have a new sample, you then use only transform. See below:

    X1 = np.loadtxt(fname="dataset1.txt")
    pca = PCA(n_components=50)
    Y1 = pca.fit_transform(X1)
    
    X2 = np.loadtxt(fname="dataset2.txt")
    Y2 = pca.transform(X2)