How do I preserve labels when doing PCA? I saw 2 tutorials and they leave this completely out: tutorial
Here is my code:
combinedOutputDataFrame = pd.DataFrame(resultArray)
# Separating out the features
x = combinedOutputDataFrame.loc[:, 0:31].values
# Separating out the target
y = combinedOutputDataFrame.loc[:,[32]].values
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDataFrame = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2', 'principal component 3'])
finalDf = pd.concat([principalDataFrame, combinedOutputDataFrame[[32]]], axis = 1)
How can I be sure in which order principalComponents is, though?
principalComponents
array([[129.58602603, -21.59786631, -6.84613849],
[-39.42963482, 35.19985695, 19.86945922],
[ 54.81949577, -5.96905719, -76.57776259],
...,
[ 69.21840475, -35.17983093, -39.66853653],
[ 18.91508026, -41.64341368, 0.21503516],
[145.91595004, 127.82236242, 115.14571367]])
My end goal is to visualise this, and to color each dot on the plot with corresponding class. But how can I put labels on the data after performing PCA?
The components are already ordered in descending order from the one which explain the most variance to the one which explain the least. You can check this by printing out the explained variance ratio with pca.explained_variance_ratio_
import numpy as np
from sklearn.decomposition import PCA
# just a random matrix
rand_matrix = np.random.rand(30,6)
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(rand_matrix)
print(pca.explained_variance_ratio_)
Out:
array([0.28898895, 0.22460396, 0.16874681])