Search code examples
pythonscikit-learnlinear-regressioncross-validation

Different results produced by linearRegression.score and svm.svc(kernel = linear)


I am very new to machine learning.

I have one dataset and I want to apply test-train split and cross validation for linear regression on it.

I have tried to split the dataset using train_test_split(X, y, test_size=0.3), and I performed both

reg = LinearRegression().fit(X_train,y_train)
reg.score(X_test,y_test)
clf = svm.SVC(kernel = 'linear').fit(X_train,y_train)
clf.score(X_test,y_test)

the reg.score gives an output of 0.98, but the clf.score only gives a very low, close to 0 output. Why are these result different?

I also attempted

clf = svm.SVC(kernel='linear', C=1, random_state=42)
scores = cross_val_score(clf, X, y, cv=2)

It also gives very small numbers, and saw this warning

UserWarning: The least populated class in y has only 1 members, which is less than n_splits=2.
  % (min_groups, self.n_splits)), UserWarning)

I have tried different cv but cv>5 gives an error "n_splits=5 cannot be greater than the number of members in each class." Note that the dataset I am using is not a binary or a simple multiclass. It's more like a monthly sales data than a categorizing one. I think that is probably why it is causing the warning. What should I do in this case?


Solution

  • You are mixing up regression (for continuous valued targets) and classification here.
    The linear regression model expects a continuous valued target and it's score is the coefficient of determination.
    The support vector classifier is a classification method and it's score is classification accuracy.

    You state that your target is not categorical so I suppose you would want to go with support vector regression instead of SVC.

    Since you say to be new to machine learning maybe also have a look at this tutorial.