python machine-learning svm multiclass-classification

Using SVM on top of CNN extracted features - How to do Multi-Class classification?

In MNIST data set, there's 10 classification output. Now, I like to using SVM as a classifier for this classification task. I used CNN architecture ( excluding top layer or classifier ) to extract a feature from the raw image first and then fit it in SVM classifier.

SVM is a binary classifier, so we can use One-vs-One or One-vs-Rest approach to doing such things. I use below implemented code from sci-kit learn official doc. But couldn't realize where I specify to model about the multi-class label or this is One-One or One-Rest approach.

The data set shape looks below:

train : (2045, 32, 32)
label : (2045, 10)

After extracting feature using non-top CNN code, we get:

train : (7636, 256)  < - cnn_Xtrain
label : (7636,)      < - Ytrain

The SVM classifier I've tried

# SVC classifier
SVMC = SVC(probability=True)
svc_param_grid = {'kernel': ['rbf'], 
                  'gamma': [0.0001, 0.001],
                  'C': [1, 10, 50]}

gsSVMC = GridSearchCV(SVMC, param_grid = svc_param_grid, cv = K_fold,
                      scoring="accuracy", n_jobs= -1, verbose = 1)

gsSVMC.fit(cnn_Xtrain, Ytrain) # fitting extracted features

SVMC_best = gsSVMC.best_estimator_

In this classifier how SVM understand this is a multi-class problem or one-vs-one or one-vs-rest? The scoring result is more suspicious to me, I evaluate almost 98% indeed. Is kernel specified in grid-search ( RBF ) responsible for this? Or I just did something wrong here?

In addition, is it fine to extract the feature from the raw image using CNN codes and then fit it in SVM or similar classifier?

Solution

The decision weather to use one-vs-rest or one-vs-one is set in the 'decision_function_shape' parameter of the classifier (see doc for svc). There it states:

Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy. Changed in version 0.19: decision_function_shape is ‘ovr’ by default. New in version 0.17: decision_function_shape=’ovr’ is recommended. Changed in version 0.17: Deprecated decision_function_shape=’ovo’ and None.

so now one vs rest is the default and since you didn't specify this parameter it is probably what was used in your code.

As for your question about using CNN for feature extraction before fitting: In general it should work. However, using the right kernel it should not really be necessary. If you want to do reduce the dimension of your feature vectors, you can just use pca or non linear embedding methods like manifold embedding to get less features.

hope this helps.