From my understanding, One-Class SVM's are trained without target/label data.
One answer at Use of OneClassSVM with GridSearchCV suggests passing Target/Label data to GridSearchCV's fit method when the classifier is the OneClassSVM
.
How does the GridSearchCV
method handle this data?
Does it actually train the OneClassSVM
without the Target/label data, and just use the Target/label data for evaluation?
I tried following the GridSearchCV source code, but I couldn't find the answer.
Does it actually train the OneClassSVM without the Target/label data, and just use the Target/label data for evaluation?
Yes to both.
GridSearchCV does actually send labels to OneClassSVM in fit
call, but OneClassSVM simply ignores it. Notice in the 2nd link how an array of ones is sent to main SVM trainer instead of given label array y
. Parameters like y
in fit
exists only so that meta estimators like GridSearchCV
can work in a consistent way without worrying about supervised/unsupervised estimators.
To actually test this, lets first detect outliers using GridSearchCV:
X,y = load_iris(return_X_y=True)
yd = np.where(y==0,-1,1)
cv = KFold(n_splits=4,random_state=42,shuffle=True)
model = GridSearchCV(OneClassSVM(),{'gamma':['scale']},cv=cv,iid=False,scoring=make_scorer(f1_score))
model = model.fit(X,yd)
print(model.cv_results_)
Note all the splitx_test_score
in cv_results_
.
Now lets do it manually, without sending labels yd
during fit
call:
for train,test in cv.split(X,yd):
clf = OneClassSVM(gamma='scale').fit(X[train]) #Just features
print(f1_score(yd[test],clf.predict(X[test])))
Both should yield exactly same scores.