Search code examples
pythonscikit-learnpca

Determine n_components of PCA such that the explained variance ratio is 0.99


How can I easily determine what n_components should be for Scikit_Learn's PCA?


Solution

  • I personally use the following

    wanted_explained_variance_ratio = 0.99
    steps_down = 2
    wanted_n_components = X_train.shape[1]
    first_time = True
    
    for i in range(X_train.shape[1]-1, 1, -steps_down):
      total_var_ratio = round(np.sum(PCA(n_components=i).fit(X_train).explained_variance_ratio_), 5)
      print('i =', i, 'with a variance ratio of', total_var_ratio)
      if total_var_ratio < wanted_explained_variance_ratio and first_time:
        wanted_n_components = i + steps_down
        first_time = False
        # break
    
    print("We should set n_components to: ", wanted_n_components)
    

    Expected output

    i = 28 with a variance ratio of 0.99975
    i = 26 with a variance ratio of 0.99901
    i = 24 with a variance ratio of 0.99807
    i = 22 with a variance ratio of 0.99699
    i = 20 with a variance ratio of 0.99574
    i = 18 with a variance ratio of 0.99428
    i = 16 with a variance ratio of 0.99195
    i = 14 with a variance ratio of 0.98898
    i = 12 with a variance ratio of 0.98534
    i = 10 with a variance ratio of 0.98073
    i = 8 with a variance ratio of 0.97405
    i = 6 with a variance ratio of 0.96544
    i = 4 with a variance ratio of 0.9539
    i = 2 with a variance ratio of 0.93572
    we should set n_components to:  16