Search code examples
machine-learningnlppcasvd

How many principal components should I choose for PCA?


I have a dataframe with few categorical and numerical features. To that I've concatenated my BoW(CountVectorizer) of text column which resulted in more than 56,000 features. So I'm considering to do PCA for reducing number of features.

I think choosing correct number of principal components is crucial here but I'm confused on how many n_components to consider here?


Solution

  • You can plot a graph with the top k components and the variance of the k components.Choose k based on the variance contained in those components.95% or above would be ideal.