Search code examples
pythonscikit-learngridsearchcv

GridSearchCV fitting


I'm having problems to fit my classifier using binarized labels.

clf_linear = GridSearchCV(SVC(kernel='linear', class_weight='balanced'),
                      param_grid, cv=5)

clf_linear = clf_linear.fit(X_train_pca, y_train)

y_train was binarized by the following method:

y_train = label_binarize(y_train, classes=[1, 2, 3])

I got the following error:

File "C:\Python\lib\site-packages\sklearn\utils\validation.py", line 788, in column_or_1d raise ValueError("bad input shape {0}".format(shape)) ValueError: bad input shape (545, 3)

The input label shape is (682, 3) not (545, 3).

My professor told me to use binarized labels in gridSearchCV, but reading scikit-learn docs I think I can't do this.


Solution

  • Doesn't matter its 682,3 or 545,3. Why the target has 3 columns? Your y (targets) should be 1-d array for SVC. You dont need to do the label_binarize operation. Keep y_train as it is.

    Doing this:

    y_train = label_binarize(y_train, classes=[1, 2, 3])
    

    Will convert the y_train to label-indicator matrix. That is used for multi-label classification problems (where the sample can have more than one class at a time). Its not used for multi-class problems.

    Keep the y_train as it is to keep it as one-dimensional array and SVC will handle the rest.