Scikit learn wrong predictions with SVC

I am trying to predict the MNIST (http://pjreddie.com/projects/mnist-in-csv/) dataset with an SVM using the radial kernel. I want to train with few examples (e.g. 1000) and predict many more. The problem is that whenever I predict, the predictions are constant unless the indices of the test set coincide with those of the training set. That is, suppose I train with examples 1:1000 from my training examples. Then, the predictions are correct (i.e. the SVM does its best) for 1:1000 of my test set, but then I get the same output for the rest. If however I train with examples 2001:3000, then only the test examples corresponding to those rows in the test set are labeled correctly (i.e. not with the same constant). I am completely at a loss, and I think that there is some sort of bug, because the exact same code works just fine with LinearSVC, although evidently the accuracy of the method is lower.

First, I train with examples 501:1000 of training data:

# dat_train/test are pandas DFs corresponding to both MNIST datasets
dat_train = pd.read_csv('data/mnist_train.csv', header=None)
dat_test = pd.read_csv('data/mnist_train.csv', header=None)

svm = SVC(C=10.0)
idx = range(1000)
#idx = np.random.choice(range(len(dat_train)), size=1000, replace=False)
X_train = dat_train.iloc[idx,1:].reset_index(drop=True).as_matrix()
y_train = dat_train.iloc[idx,0].reset_index(drop=True).as_matrix()
X_test = dat_test.reset_index(drop=True).as_matrix()[:,1:]
y_test = dat_test.reset_index(drop=True).as_matrix()[:,0]
svm.fit(X=X_train[501:1000,:], y=y_train[501:1000])

Here you can see that about half the predictions are wrong

y_pred = svm.predict(X_test[:1000,:])
confusion_matrix(y_test[:1000], y_pred)

All wrong (i.e. constant)

y_pred = svm.predict(X_test[:500,:])
confusion_matrix(y_test[:500], y_pred)

This is what I would expect to see for all test data

y_pred = svm.predict(X_test[501:1000,:])
confusion_matrix(y_test[501:1000], y_pred)

You can check that all of the above are correct using LinearSVC!

Solution

The default kernel is RBF, in which case gamma matters. If gamma is not provided, it is auto by default, which is 1/n_features. You'd better run grid search to find the optimal parameters. Here I just illustrate the result is normal given suitable parameters.

In [120]: svm = SVC(C=1, gamma=0.0000001)

In [121]: svm.fit(X=X_train[501:1000,:], y=y_train[501:1000])
Out[121]:
SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma=1e-07, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [122]: y_pred = svm.predict(X_test[:1000,:])

In [123]: confusion_matrix(y_test[:1000], y_pred)
Out[123]:
array([[ 71,   0,   2,   0,   2,   9,   1,   0,   0,   0],
       [  0, 123,   0,   0,   0,   1,   1,   0,   1,   0],
       [  2,   5,  91,   1,   1,   1,   3,   7,   5,   0],
       [  0,   1,   4,  48,   0,  40,   1,   5,   7,   1],
       [  0,   0,   0,   0,  88,   2,   3,   2,   0,  15],
       [  1,   1,   1,   0,   2,  77,   0,   3,   1,   1],
       [  3,   0,   3,   0,   5,   4,  72,   0,   0,   0],
       [  0,   2,   3,   0,   3,   0,   1,  88,   1,   1],
       [  2,   0,   1,   2,   3,   9,   1,   4,  63,   4],
       [  0,   1,   0,   0,  16,   3,   0,  11,   1,  62]])