Search code examples
pythonmatplotlibscikit-learnsklearn-pandas

Make a scatterplot from sklearn PCA result for python


I am trying to display a scatterplot of a dataset that I made two dimensional with the PCA function from sklearn. My data is returned as followns:

array([[ -3.18592855e+04,  -2.13479310e+00],
       [ -3.29633003e+04,   1.40801796e+01],
       [ -3.25352942e+04,   7.36921088e+00],
...

I expected that the following code would work:

import pylab
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

pca = PCA(n_components=2).fit(instances)
pca_2d = pca.transform(instances)

fig = plt.figure(figsize=(8,3))
plt.scatter(pca_2d[0],pca_2d[1])
plt.show()

But this returned an incorrect figure only displaying the first two values. What do I need to change to get this up and running?


Solution

  • You gave 2 first rows instead of 2 columns of pca_2d to build your scatterplot.

    Do:

    import matplotlib.pyplot as plt
    from sklearn.decomposition import PCA
    import numpy as np
    
    instances = np.array([[ 1,  2],
                          [ 3,  4],
                          [ 5,  6]])
    pca = PCA(n_components=2).fit(instances)
    pca_2d = pca.transform(instances)
    
    fig = plt.figure(figsize=(8,3))
    plt.scatter(pca_2d[:,0],pca_2d[:,1])
    plt.show()
    

    Give well 3 points :

    scatterplot