Search code examples
pythonscikit-learnsklearn-pandas

Sklearn SVM: SVR and SVC, getting the same prediction for every input


Here is a paste of the code: SVM sample code

I checked out a couple of the other answers to this problem...and it seems like this specific iteration of the problem is a bit different.

First off, my inputs are normalized, and I have five inputs per point. The values are all reasonably sized (healthy 0.5s and 0.7s etc--few near zero or near 1 numbers).

I have about 70 x inputs corresponding to their 70 y inputs. The y inputs are also normalized (they are percentage changes of my function after each time-step).

I initialize my SVR (and SVC), train them, and then test them with 30 out-of-sample inputs...and get the exact same prediction for every input (and the inputs are changing by reasonable amounts--0.3, 0.6, 0.5, etc.). I would think that the classifier (at least) would have some differentiation...

Here is the code I've got:

# train svr

my_svr = svm.SVR()
my_svr.fit(x_training,y_trainr)

# train svc

my_svc = svm.SVC()
my_svc.fit(x_training,y_trainc)


# predict regression

p_regression = my_svr.predict(x_test)
p_r_series = pd.Series(index=y_testing.index,data=p_regression)

# predict classification

p_classification = my_svc.predict(x_test)
p_c_series = pd.Series(index=y_testing_classification.index,data=p_classification)

And here are samples of my inputs:

x_training = [[  1.52068627e-04   8.66880301e-01   5.08504362e-01   9.48082047e-01
7.01156322e-01],
              [  6.68130520e-01   9.07506250e-01   5.07182647e-01   8.11290634e-01
6.67756208e-01],
              ... x 70 ]

y_trainr = [-0.00723209 -0.01788079  0.00741741 -0.00200805 -0.00737761  0.00202704 ...]

y_trainc = [ 0.  0.  1.  0.  0.  1.  1.  0. ...]

And the x_test matrix (5x30) is similar to the x_training matrix in terms of magnitudes and variance of inputs...same for y_testr and y_testc.

Currently, the predictions for all of the tests are exactly the same (0.00596 for the regression, and 1 for the classification...)

How do I get the SVR and SVC functions to spit out relevant predictions? Or at least different predictions based on the inputs...

At the very least, the classifier should be able to make choices. I mean, even if I haven't provided enough dimensions for regression...


Solution

  • Try increasing your C from the default. It seems you are underfitting.

    my_svc = svm.SVC(probability=True, C=1000)
    my_svc.fit(x_training,y_trainc)
    
    p_classification = my_svc.predict(x_test)
    

    p_classification then becomes:

    array([ 1.,  0.,  1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,
            1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,
            1.,  1.,  1.,  1.])
    

    For the SVR case you will also want to reduce your epsilon.

    my_svr = svm.SVR(C=1000, epsilon=0.0001)
    my_svr.fit(x_training,y_trainr)
    
    p_regression = my_svr.predict(x_test)
    

    p_regression then becomes:

    array([-0.00430622,  0.00022762,  0.00595002, -0.02037147, -0.0003767 ,
            0.00212401,  0.00018503, -0.00245148, -0.00109994, -0.00728342,
           -0.00603862, -0.00321413, -0.00922082, -0.00129351,  0.00086844,
            0.00380351, -0.0209799 ,  0.00495681,  0.0070937 ,  0.00525708,
           -0.00777854,  0.00346639,  0.0070703 , -0.00082952,  0.00246366,
            0.03007465,  0.01172834,  0.0135077 ,  0.00883518,  0.00399232])
    

    You should look to tune your C parameter using cross validation so that it is able to perform best on whichever metric matters most to you. You may want to look at GridSearchCV to help you do this.