Search code examples
pythonmachine-learningscikit-learnsvmauc

AUC-ROC for a none ranking Classifier such as OSVM


Im currently working with auc-roc curves , and lets say that I have a none ranking Classifier such as a one class SVM where the predictions are either 0 and 1 and the predictions are not converted to probabilities or scores easily , if I do not want to plot the AUC-ROC instead I would only like to calculate the AUC to use it to see how well my model is doing , can I still do that ? would it still be called or as an AUC especially that there are two thresholds that can be used (0 , 1 ) ? if it is would it be as good as calculating the AUC with ranking scores

now lets say that I have decided to plot the AUC-ROC using the labels created by the SVM (0,1) , it would look like the the bellow picture I enter image description here

would it still be considered as and AUC-curve?

thanks you very much for all your help and support

Note : I have read the below questions and I did not find an answer : https://www.researchgate.net/post/How_can_I_plot_determine_ROC_AUC_for_SVM https://stats.stackexchange.com/questions/37795/roc-curve-for-discrete-classifiers-like-svm-why-do-we-still-call-it-a-curve


Solution

  • The standard ROC curve requires varying the probability or score threshold of your classifier, and obtaining the corresponding graph of the ordered pairs of (true positive rate, false positive rate) for each varied threshold value.

    Since the One-Class SVM is defined in such a way that it does not produce probability results or scores as part of its output (this is specifically different than standard SVM classifiers), it means that a ROC curve is inapplicable unless you create your own version of a score as discussed below.

    Furthermore, the training for a One-Class SVM is specifically hugely imbalanced, because the training data is solely a set of "positive" examples, e.g. observations that come from the distribution in question. ROC curves would suffer greatly from large class imbalance anyway, so the ROC curve could be misleading in the sense that the classification score for a small number of outliers would be hugely more important than the score for a bunch of non-outliers at the heart of the observed distribution's highest density areas. So avoiding ROC for this type of model, even if you create your own scores, is advisable.

    You are correct to choose precision vs. recall as a better metric, but in the plot you show in your question, you are still overlaying a plot on top of true positive rate and false positive rate along the axes, while the AUC-pr (precision recall AUC score) looks like it is just a single point padded with 0 for the false positive rate (e.g. it is purely a bug in your code for plotting).

    In order to get an actual precision recall curve, you need some way of associating a score to the outlier decision. One suggestion is to use the decision_function attribute of the fitted OneClassSVM object after training.

    If you compute the maximum of decision_function(x) over all input values x, call this MAX, then one way of associating a score is to treat the score for the prediction on some data y as score = MAX - decision_function(y).

    This assumes you have the labels set up in such a way that large value of decision_function(x) means x is not an outlier, so it does have the label of the positive class used for training. You could take the reciprocal or use other transformations if you set up your problem with reverse labels (meaning, whether you set the OneClassSVM to predict '1' for an outlier or '1' for an inlier, even though the training data consists only of one class).

    Then, in the documentation of average_precision_score you can see that the input y_score can be a non-thresholded measure, like from decision_function. You could also tinker with this, perhaps taking log of that score, etc., if you have any domain knowledge about it that gives you a reason to think to try it.

    Once you have these manually created scores, you can pass them in for any of the precision / recall functions that need to vary the threshold. It's not perfect, but at least gives you a sense of how well the decision boundary is used for the classification.