Search code examples
scikit-learnsvmcross-validation

How to use classifiers used by cross_validation_scores


I am attempting to train a cross validated SVM model (for a school project). Given X and y, when I call

clf = svm.SVC(gamma='scale')
scores = cross_val_score(clf, X, y, cv=4)

scores is set to an array as expected, but I want to be able to call clf.predict(test_x) but when I do it throws an exception with the message This SVC instance is not fitted yet. Call 'fit' with appropriate arguments before using this method. (I wish it would return something like [scores, predictor] or maybe a CrossValidationPredictor that has a predict method, but that is not the case.)

Of course, I can call classifier = clf.fit(X, y) but that doesn't give me a cross validated SVM predictor, how do I get a cross validated predictor that I can use to—you know—predict?


Solution

  • Of course, I can call classifier = clf.fit(X, y) but that doesn't give me a cross validated SVM predictor, how do I get a cross validated predictor that I can use to—you know—predict?

    clf.fit(X, y) is exactly what you should do.

    There is no such thing as a cross validated predictor because cross-validation is not a method for training a predictor but, well, for validating a type of predictor. Let me quote the Wikipedia entry:

    Cross-validation [...] is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.

    (Statistical analysis, here, includes prediction models such as regressors or classifiers.)

    The question that cross validation answers is "How well will my classifier perform later when I apply it to data I don't have yet?". Usually you try to cross validate different classifiers or hyperparameters and then select the one with the highest score, which is the one that is expected to generalize best to unseen data.

    Finally you train the classifier on the full data set, because you want to deploy the best possible classifer.