python machine-learning scikit-learn pca

Reducing data to one dimension using PCA

Can the dimension of the data be reduced to only one principal component?

I tried it on the iris data set-

from sklearn.decomposition import PCA
import pandas as pd
import matplotlib.pyplot as plt

pca = PCA(n_components=1)
pca_X = pca.fit_transform(X)   #X = standardized iris data

pca_df = pd.DataFrame(pca_X, columns=["PCA1"])

plt.plot(pca_df["PCA1"], "o")

We can see three different clusters. So can to dimension be reduced to 1?

Solution

You can choose to reduce the dimensions to 1 using PCA, the only thing it promises is that the resultant principal component is in the direction of highest variance in the data.

If you are reducing the dimensions in order to improve classification you can use Linear Discriminant Analysis which gives you the direction of maximum separation between the classes.