Search code examples
pythonscikit-learnsvmmnisthyperparameters

Changing the hyperparameters of my classifier SVM trained on a subset of the MNIST digit dataset doesn't change the accuracy at all?


I'm a beginner when it comes to machine learning and I am attempting to find hyperparameters for an SVM in order to gain 95% accuracy or better for the digit mnist dataset.

The cvs file the code reads from contains 784 attributes and one classifier, where the classifier is the first column of the csv file. The dataset is not, as far as I can see, sorted by classifier.

Below is the code I've written so far. The variables Ctemp and gammatemp are the hyperparameters in question. The accuracy is the testscore variable.

import numpy as np
import os
from sklearn.svm import SVC       

path = os.getcwd()
traindata, testdata = np.loadtxt(path + "\username\data\\mnist_train.csv", delimiter=",",max_rows=4000), np.loadtxt(path + "\username\data\\mnist_test.csv", delimiter=",",max_rows=1000)

np.random.seed(123)
shuffletrain = np.random.permutation(4000)
np.random.seed(123)
shuffletest = np.random.permutation(1000)
traindata, testdata = traindata[shuffletrain,:], testdata[shuffletest,:]

y_train, y_test = traindata[:2000,0], testdata[:500,0]
X_train, X_test = traindata[:2000,1:], testdata[:500,1:]

Ctemp = 10000
gammatemp = 'auto'
machine = SVC(C=Ctemp, kernel='rbf', gamma=gammatemp)
machine.fit(X_train,y_train)                                   
testscore = machine.score(X_test,y_test)   
print(testscore)

As I train the vector machine with different hyperparameters I get exactly the same testscores. I noticed this when I first tried doing a gridsearch. The testscore does however change if I add a 'skip_rows' argument to the loadtxt function, so different datapoints do result in a different test score.

I tried shuffling the dataset as you can see in my code, still no difference when I change the hyperparameters, leading me to think that this isn't an error with the dataset itself. No matter if C is 1 or 10000, no matter if gamma is 0.1 or 1000. No difference is made.

When I use this exact code for a different dataset it works as expected, but for this dataset the testscore is always 0.194, no matter which hyperparameters I use.

What am I missing? Why is the hyperparameters not affecting the test score?


Solution

  • tl;dr

    You can try a wider range of hyper-parameters to get better (different) test scores.


    I don't have your MNIST so I used kaggle MNIST and arbitrarily split them into train and test sets, the same size as yours.

    A good theoretical fact about SVM is that you can always have 100% training accuracy when gamma is small and C is large enough (see this paper). So when you are not sure if there are some weird bugs in the code, you can try an extreme parameter and see what you get.

    I tried C from 10^-8 to 10^14 and gamma 10^-14 to 10^8, I got max test score = 0.92

    You can see my kaggle notebook for full code.

    The code

    log10C_range = list(range(-8, 14, 3))
    log10g_range = list(range(-14, 8, 3))
    with tqdm(total=len(log10C_range)*len(log10g_range)) as pbar:
        for log10C in log10C_range:
            for log10g in log10g_range:
                C = 10**log10C
                g = 10**log10g
                if (C, g) in scores:
                    pbar.set_description(f"C={C:g} g={g:g} score={scores[C,g]:g}")
                    pbar.update(1)
                    continue
                machine = SVC(C=C, kernel='rbf', gamma=g)
                machine.fit(X_train,y_train)     
                testscore = machine.score(X_test,y_test)
                scores[C, g] = testscore
                pbar.set_description(f"C={C:g} g={g:g} score={testscore:g}")
                pbar.update(1)
    

    Test score of differet params