How can I easily determine what n_components
should be for Scikit_Learn's PCA?
I personally use the following
wanted_explained_variance_ratio = 0.99
steps_down = 2
wanted_n_components = X_train.shape[1]
first_time = True
for i in range(X_train.shape[1]-1, 1, -steps_down):
total_var_ratio = round(np.sum(PCA(n_components=i).fit(X_train).explained_variance_ratio_), 5)
print('i =', i, 'with a variance ratio of', total_var_ratio)
if total_var_ratio < wanted_explained_variance_ratio and first_time:
wanted_n_components = i + steps_down
first_time = False
# break
print("We should set n_components to: ", wanted_n_components)
Expected output
i = 28 with a variance ratio of 0.99975
i = 26 with a variance ratio of 0.99901
i = 24 with a variance ratio of 0.99807
i = 22 with a variance ratio of 0.99699
i = 20 with a variance ratio of 0.99574
i = 18 with a variance ratio of 0.99428
i = 16 with a variance ratio of 0.99195
i = 14 with a variance ratio of 0.98898
i = 12 with a variance ratio of 0.98534
i = 10 with a variance ratio of 0.98073
i = 8 with a variance ratio of 0.97405
i = 6 with a variance ratio of 0.96544
i = 4 with a variance ratio of 0.9539
i = 2 with a variance ratio of 0.93572
we should set n_components to: 16