Search code examples
pythonmachine-learningscikit-learnclassificationmetrics

Custom metrics for multiclass classification when class errors have different weights


I have a multiclass classification problem (eg. the target variable is made by 4 different outcomes: Product A, Product B, Product C and NO Product). Not all the errors are equal: for example, if the true label is "Product A" and the prediction is "NO Product" it is not a big problem, while if the true label is "Product C" the impact of the error is much bigger. Basically, I have to insert this information into the loss function of the algorithm (I am currently using Xg-Boost, Random Forest, ecc).

Any idea on how to implement it on scikit-learn or other ML libraries on Python?


Solution

  • Suppose this the mapping of your classes:

    {'Product A':0, 'Product B':1, 'Product C':2, 'NO Product':3)}
    

    Then from sklearn.ensemble.RandomForestClassifier docs, use class_weight as follows:

    rf = RandomForestClassifier(n_estimators = 100, class_weight = {0:1,1:1,2:2,3:1})
    

    This will give more weights to 'Product C'