Search code examples
pythonscikit-learnsklearn-pandas

How to modify the labels of the target variable in OneClassSVM in python


I know in OneClassSVM that label -1 is outlier and 1 is inline. Then I understand that zero is the boundary. Is there a way to make the label to be printed in 2, not 3 in total? What I want to do is, to integrate OneClassSVM label like other models. For example, I found other model labeled as "0" and "1". But the code I used to run the model didn't fit with OneClassSVM model, because it returns "-1" and "1" label.

for item in y_pred:
    item.replace("-1","0")

What I tried was, to change "-1" to "0", but I'm sure that this is not a right solution. It is necessary to finally print out only two labels without breaking the label values.


Solution

  • You can change the values from -1, 1 to 0,1 for example in order to represent samples that are outliers and inliers, respectively. This only for your convenience. This will not change anything in reality.

    There is nothing wrong with this conversion. However, you should be careful how to interpret the results.

    To change labels:

    If y_pred is a list or numpy array:

    y_pred = [-1, 1, -1]
    
    y_pred_new = [0 if i==-1 else 1 for i in y_pred]
    
    print(y_pred_new)
    [0, 1, 0]
    

    Why it does not really matter?

    import numpy as np
    from sklearn.svm import SVC
    
    X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
    y = np.array([1, 1, 2, 2])
    
    clf = SVC(kernel='linear')
    clf.fit(X, y)
    print(clf.predict(X))
    # array([1  1  2  2])
    

    change labels and re-fit

    X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
    y = np.array([-1, -1, 1, 1])
    
    clf = SVC(kernel='linear')
    clf.fit(X, y)
    print(clf.predict(X))
    # array([-1 -1  1  1])
    

    as you can see in both cases, the first and last 2 samples are assigned with the same labels, respectively.