I am trying to display a scatterplot of a dataset that I made two dimensional with the PCA
function from sklearn
. My data is returned as followns:
array([[ -3.18592855e+04, -2.13479310e+00],
[ -3.29633003e+04, 1.40801796e+01],
[ -3.25352942e+04, 7.36921088e+00],
...
I expected that the following code would work:
import pylab
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
pca = PCA(n_components=2).fit(instances)
pca_2d = pca.transform(instances)
fig = plt.figure(figsize=(8,3))
plt.scatter(pca_2d[0],pca_2d[1])
plt.show()
But this returned an incorrect figure only displaying the first two values. What do I need to change to get this up and running?
You gave 2 first rows instead of 2 columns of pca_2d
to build your scatterplot.
Do:
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import numpy as np
instances = np.array([[ 1, 2],
[ 3, 4],
[ 5, 6]])
pca = PCA(n_components=2).fit(instances)
pca_2d = pca.transform(instances)
fig = plt.figure(figsize=(8,3))
plt.scatter(pca_2d[:,0],pca_2d[:,1])
plt.show()
Give well 3 points :