Search code examples
pandasnumpymatplotlibscikit-learnpca

Python 3.5 Trying to plot PCA with sklearn and matplotlib


Using the following code generates the error: TypeError: float() argument must be a string or a number, not 'Pred':

I am struggling to figure out what is causing this error to be thrown.

self.features is a list composed of three floats ex. [1.1, 1.2, 1.3] an example of self.features:

[array([-1.67191985,  0.1       ,  9.69981494]), array([-0.68486623,  0.05      ,  9.99085024]), array([ -1.36      ,   0.1       ,  10.44720459]), array([-2.46918915,  0.        ,  3.5483372 ]), array([-0.835     ,  0.1       ,  4.02740479])]

This is the method where the error is being thrown.

def pca(self):        
    pca = PCA(n_components=2)
    x_np = np.asarray(self.features)
    pca.fit(x_np)
    X_reduced = pca.transform(x_np)
    plt.figure(figsize=(10, 8))
    plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='RdBu')
    plt.xlabel('First component')
    plt.ylabel('Second component')

The full trace back is:

Traceback (most recent call last):
  File "/Users/user/PycharmProjects/Post-Translational-Modification-                
Prediction/pred.py", line 244, in <module>
y.generate_pca()
  File "/Users/user/PycharmProjects/Post-Translational-Modification-
Prediction/pred.py", line 222, in generate_pca
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='RdBu')
File "/usr/local/lib/python3.5/site-packages/matplotlib/pyplot.py", 
line 3435, in scatter
edgecolors=edgecolors, data=data, **kwargs)
File "/usr/local/lib/python3.5/site-packages/matplotlib/__init__.py", 
line 1892, in inner
return func(ax, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/matplotlib/axes/_axes.py", line 3976, in scatter
c_array = np.asanyarray(c, dtype=float)
File "/usr/local/lib/python3.5/site-packages/numpy/core/numeric.py", line 583, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
TypeError: float() argument must be a string or a number, not 'Pred'

Solution

  • The suggested fix by @WhoIsJack is to add np.arange(len(self.features))

    The functional code for those who run into similar issues is:

    def generate_pca(self):
        y= np.arange(len(self.features))
        pca = PCA(n_components=2)
        x_np = np.asarray(self.features)
        pca.fit(x_np)
        X_reduced = pca.transform(x_np)
        plt.figure(figsize=(10, 8))
        plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='RdBu')
        plt.xlabel('First component')
        plt.ylabel('Second component')
        plt.show()