I have a dataset with 23 rows and 48 columns. I am applying PCA to reduce the number of column dimensions. I use the following codes examples and I see that only 23 are required features:
#first
import numpy as np
from sklearn.decomposition import PCA
pca = PCA().fit(only_features)
plt.figure(figsize=(15,8))
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')
#second
df_pca = pca.fit_transform(X=only_features)
df_pca = pd.DataFrame(df_pca)
print(df_pca.shape)
However, I would want to know which are the features required. Like for example: If the original dataset had columns A-z and reduced by PCA, then I would want to know which are the features selected.
How to do that?
Thanks for help
Credit to this answer1 & answer2, Sklearn's documentation states that the number of components retained when you don't specify the n_components
parameter is min(n_samples, n_features)
. So min(23, 48) = 23
that's why you required 23 in your case.
Solution 1: if you use Sklearn
library credit to this answer
pca.explained_variance_ratio_
print(abs( pca.components_ ))
Solution 2: if you use PCA
library documenetation
# Initialize
model = pca()
# Fit transform
out = model.fit_transform(X)
# Print the top features. The results show that f1 is best, followed by f2 etc
print(out['topfeat'])
# PC feature
# 0 PC1 f1
# 1 PC2 f2
# 2 PC3 f3
# 3 PC4 f4
# 4 PC5 f5
...
Even you can make a plot of PCs by: model.plot()