I did KMeans clustering after reducing numerical columns in my DataFrame from 5 to 2 using PCA and plotted scatterplot
pc=PCA(n_components = 2).fit_transform(scaled_df)
scaled_df_PCA= pd.DataFrame(pc, columns=['pca_col1','pca_col2'])
#Then I did the KMeans and its plotting
label_PCA=final_km.fit_predict(scaled_df_PCA)
scaled_df_PCA["label_PCA_df"]=label_PCA
a=scaled_df_PCA[scaled_df_PCA.label_PCA_df==0]
b=scaled_df_PCA[scaled_df_PCA.label_PCA_df==1]
c=scaled_df_PCA[scaled_df_PCA.label_PCA_df==2]
sns.scatterplot(a.pca_col1, a.pca_col2, color="green")
sns.scatterplot(b.pca_col1, b.pca_col2, color="red")
sns.scatterplot(c.pca_col1, c.pca_col2, color="yellow")
I get 3 clusters from above based upon 2 columns reduced using PCA. Now I wish to get the columns back for further analysis of those clusters but I am not able to. And when i use pc.components_ I get error :
AttributeError Traceback (most recent call last) /tmp/ipykernel_33/4073743739.py in ----> 1 pc.components_
AttributeError: 'numpy.ndarray' object has no attribute 'components_'
or when I do scaled_df_PCA.components_
AttributeError: 'DataFrame' object has no attribute 'components_'
So I wanted to know how to recover details of columns back which were reduced during PCA.
This line from your code stores an NDArray into pc
rather than the PCA instance.
pc=PCA(n_components = 2).fit_transform(scaled_df)
An easy fix is to create the PCA instance first and then call fit_transform()
.
pca = PCA(n_components=2)
df_transformed = pca.fit_transform(scaled_df)
Afterwards, you can still access attributes and methods of the PCA instance, pca
.