Search code examples
machine-learningscikit-learnpca

sklearn PCA - Calculate % of variance retained for choosing k


I am using scikit learn PCA and trying to choose the minimum number of components that satisfies 1-(sum i 1 to k Sii)/(sum j 1 to n Sjj) <= 0.01 where S is the svd diagonal matrix, in order to have 99% of the variance retained.

  1. Does scikit learn has a function that returns minimum components for a given variance retained % threshold?
  2. Is there a more efficient way to come up with n_component?

Thanks.


Solution

  • Simply set n_components to be float, and it will be used as a lower bound of explained variance.

    From scikit-learn documentation

    n_components : int, None or string

    Number of components to keep. if n_components is not set all components are kept: n_components == min(n_samples, n_features) if n_components == ‘mle’, Minka’s MLE is used to guess the dimension if 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components