Search code examples
pythonscikit-learnpca

Add new vector to PCA new space data python


Imagine I have training data with 9 dimension and 6000 sample, and I applied PCA algorithm using sklearn PCA.
I reduce it's dimensions to 4, and know I want convert one new sample with 9 features to my training data space with 4 components as fast as possible.
here is my first pca code:

X_std = StandardScaler().fit_transform(df1)
pca = PCA(n_components = 4)
result = pca.fit_transform(X_std)

Is there any way do this with sklearn pca function?


Solution

  • If you want to transform the original matrix to the reduced dimensionality projection offered by PCA you can use the transform function which will run an efficient inner-product on the eigenvectors and the input matrix:

    pca = PCA(n_components=4)
    pca.fit(X_train)
    X_std_reducted = pca.transform(X_std)
    

    From the scikit source:

    X_transformed = fast_dot(X, self.components_.T)
    

    So applying the PCA transformation is simply a linear combination -- very fast. Now you can apply the projection to the training set and any new data that we want to tests against in the future.

    This article describes the process in more detail: http://www.eggie5.com/69-dimensionality-reduction-using-pca