Search code examples
pythonscikit-learnsvmpca

Expected 2D Array,got 1D array instead.Where's the mistake?


I am beginning to learn SVM and PCA.I tried to apply SVM on the Sci-Kit Learn 'load_digits' dataset.

When i apply the .fit method to SVC,i get an error:

"Expected 2D array, got 1D array instead: array=[ 1.9142151 0.58897807 1.30203491 ... 1.02259477 1.07605691 -1.25769703]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature

or array.reshape(1, -1) if it contains a single sample."

Here is the code i wrote:**

from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
X_digits, y_digits = load_digits(return_X_y=True)
data = scale(X_digits)
pca=PCA(n_components=10).fit_transform(data)
reduced_data = PCA(n_components=2).fit_transform(data)
from sklearn.svm import SVC
clf = SVC(kernel='rbf', C=1E6)
X=[reduced_data[:,0]
y=reduced_data[:,1]
clf.fit(X, y)

Can someone help me out?Thank you in advance.


Solution

  • Your error results from the fact that clf.fit() requires the array X to be of dimension 2 (currently it is 1 dimensional), and by using X.reshape(-1, 1), X becomes a (N,1) (2D - as we would like) array, as opposed to (N,) (1D), where N is the number of samples in the dataset. However, I also believe that your interpretation of reduced_data may be incorrect (from my limited experience of sklearn):

    • The reduced_data array that you have contains two principle components (the two most important features in the dataset, n_components=2), which you should be using as the new "data" (X).

    • Instead, you have taken the first column of reduced_data to be the samples X, and the second column to be the target values y. It is to my understanding that a better approach would be to make X = reduced_data since the sample data should consist of both PCA features, and make y = y_digits, since the labels (targets) are unchanged by PCA.

    (I also noticed you defined pca = PCA(n_components=10).fit_transform(data), but did not go on to use it, so I have removed it from the code in my answer).

    As a result, you would have something like this:

    from sklearn.datasets import load_digits
    from sklearn.decomposition import PCA
    from sklearn.preprocessing import scale
    from sklearn.svm import SVC
    
    X_digits, y_digits = load_digits(return_X_y=True)
    data = scale(X_digits)
    # pca=PCA(n_components=10).fit_transform(data)
    reduced_data = PCA(n_components=2).fit_transform(data)
    
    clf = SVC(kernel='rbf', C=1e6)
    clf.fit(reduced_data, y_digits)
    

    I hope this has helped!