Data with five columns
one | two | three | four | five
but I want this results
pca 1 | pca 2 | five
is it possible select only 4 columns for pca ?
There's nothing mathematically unsound about reducing some of your features with PCA. The PCA features are linear combinations (rotated axes) of that sub-space, leaving the other (orthogonal) features unmodified.
I've included an example of a multivariate gaussian in x,y,z. I use PCA on x
and y
, leaving z
unmodified. You can inspect the plots to convince your self that the second set of points is indeed the same as the first, just rotated in x,y
import numpy as np
import as px
from sklearn.decomposition import PCA
means = [0,0,0]
cov = [[1,1,0],[-100,100,0],[0,0,1]]
# get scatter points drawn from multivariate
x,y,z = np.random.multivariate_normal(means, cov, 5000).T
# data
X = np.array([x,y,z]).T
# initial plot, with largest variance along x=y:
px.scatter_3d(x=x, y=y, z=z, labels={j:j for j in"xyz"}).show()
# fit pca in the x-y plane, leaving z un-modified
pca = PCA(n_components=2)[:, 0:2])
# get "rotated" pca components x', y'
q = pca.transform(X[:,0:2])
xp, yp = q[:,0], q[:,1]
px.scatter_3d(x=xp, y=yp, z=z, labels={"x":"x'", "y":"y'", "z":"z"}).show()