Search code examples
scikit-learnsvmunsupervised-learninggridsearchcvone-class-classification

Passing Target/Label data to Scikit-learn GridSearchCV's fit method for OneClassSVM


From my understanding, One-Class SVM's are trained without target/label data.

One answer at Use of OneClassSVM with GridSearchCV suggests passing Target/Label data to GridSearchCV's fit method when the classifier is the OneClassSVM.

How does the GridSearchCV method handle this data?

Does it actually train the OneClassSVM without the Target/label data, and just use the Target/label data for evaluation?

I tried following the GridSearchCV source code, but I couldn't find the answer.


Solution

  • Does it actually train the OneClassSVM without the Target/label data, and just use the Target/label data for evaluation?

    Yes to both.

    GridSearchCV does actually send labels to OneClassSVM in fit call, but OneClassSVM simply ignores it. Notice in the 2nd link how an array of ones is sent to main SVM trainer instead of given label array y. Parameters like y in fit exists only so that meta estimators like GridSearchCV can work in a consistent way without worrying about supervised/unsupervised estimators.

    To actually test this, lets first detect outliers using GridSearchCV:

    X,y = load_iris(return_X_y=True)
    yd = np.where(y==0,-1,1)
    cv = KFold(n_splits=4,random_state=42,shuffle=True)
    model = GridSearchCV(OneClassSVM(),{'gamma':['scale']},cv=cv,iid=False,scoring=make_scorer(f1_score))
    model = model.fit(X,yd)
    print(model.cv_results_)
    

    Note all the splitx_test_score in cv_results_.

    Now lets do it manually, without sending labels yd during fit call:

    for train,test in cv.split(X,yd):
        clf = OneClassSVM(gamma='scale').fit(X[train])  #Just features
        print(f1_score(yd[test],clf.predict(X[test])))
    

    Both should yield exactly same scores.