Search code examples
pythonmachine-learningscikit-learnpca

Reducing data to one dimension using PCA


Can the dimension of the data be reduced to only one principal component?

I tried it on the iris data set-

from sklearn.decomposition import PCA
import pandas as pd
import matplotlib.pyplot as plt

pca = PCA(n_components=1)
pca_X = pca.fit_transform(X)   #X = standardized iris data

pca_df = pd.DataFrame(pca_X, columns=["PCA1"])

plt.plot(pca_df["PCA1"], "o")

enter image description here

We can see three different clusters. So can to dimension be reduced to 1?


Solution

  • You can choose to reduce the dimensions to 1 using PCA, the only thing it promises is that the resultant principal component is in the direction of highest variance in the data.

    If you are reducing the dimensions in order to improve classification you can use Linear Discriminant Analysis which gives you the direction of maximum separation between the classes.