Search code examples
pythonscikit-learnsvmlibsvmscikit-multilearn

What is the best value for the parameter class_weight in LinearSVC?


I have a multi label data (some classes have 2 and some 10 labels)and my model is overfitting for balanced and None values.What are the best values to set for the class_weight parameter.

from sklearn.svm import LinearSVC
svm = LinearSVC(C=0.01,max_iter=100,dual=False,class_weight=None,verbose=1)

Solution

  • The class_weight parameters controls actually the C parameters in the following way:

    class_weight : {dict, ‘balanced’}, optional

    Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

    Try to play with the class_weight while keeping C the same e.g. C=0.1


    EDIT

    Here is a beautiful way to create the class_weight for your 171 classes.

    # store the weights for each class in a list
    weights_per_class = [2,3,4,5,6]
    
    #Let's assume that you have a `y` like this:
    y = [121, 122, 123, 124, 125]
    

    You need:

    # create the `class_weight` dictionary
    class_weight = {val:weights_per_class[index] for index,val in enumerate (y)}
    
    print(class_weight)
    #{121: 2, 122: 3, 123: 4, 124: 5, 125: 6}
    
    # Use it as argument
    svm = LinearSVC(class_weight=class_weight)