Search code examples
pythonscikit-learnpca

Sklearn.decomposition.PCA: Get components by given ratio


I want to use PCA on some data to get the top principle components of a matrix that capture 95% of the total variance. I was looking for a function doing that, but I could not find a way.
The only I could find out was the following:

from sklearn.decomposition import PCA
# W_0 is a matrix 
pca = PCA().fit(W_0)
# get the index of the component which has variance higher than 0.95
index_component = np.min(np.argwhere(np.cumsum(pca.explained_variance_ratio_)>0.95))
# Now fit again with the given component 
pca = PCA(n_components= index_component+1)
pca.fit(W_0)

The problem with this approach is that I am fitting two times which is performance bottleneck. Is there a better way to do that?


Solution

  • From the documentation, you can see If 0 < n_components < 1 and svd_solver == 'full', select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.

    To get components that satisfy atleast 95% variance, use PCA(n_components=0.95, svd_solver='full')