Search code examples
pythonmachine-learningscatter-plotpca

Do you expect a linear classifier to separate the two classes in the 2D-PC space?


I have a total of 183 features and I already applied the PCA to reduce the dimensions then I made a scatter plot. Now the question is: "Analyse the scatter plot visually. Do you expect a linear classifier to separate the two classes in the 2D-PC space?"

from sklearn.decomposition import PCA
pca = PCA(n_components = 3)
pca.fit(scaled_data)
x_pca = pca.transform(scaled_data)

Below is the scatter plot code:

plt.figure(figsize =(6, 6))
plt.scatter(x_pca[:, 0], x_pca[:, 1], c = y_train)
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component') 
legend1 = plt.legend(*scatter.legend_elements(),
               loc="upper right", title="Classes")

Here is the scatter plot.


Solution

  • If you want to classify them based on the two features which are the results of the dimension reduction after PCA, then obviously you can not expect a liner classifier to separate them as seen by the plot.

    However, it might be possible to find a linear classifier in another space that is computed by some kernel-based on all of the features.

    There is a perfect example in this link for a kernel trick applied for classifying points inside and outside of a circle. https://medium.com/@ankitnitjsr13/math-behind-svm-kernel-trick-5a82aa04ab04

    The general kernel trick can be integrated into an ml classifier easily. Most of the frameworks support it but you have to try out different kernel and see which one works best