python machine-learning scikit-learn classification metrics

Custom metrics for multiclass classification when class errors have different weights

I have a multiclass classification problem (eg. the target variable is made by 4 different outcomes: Product A, Product B, Product C and NO Product). Not all the errors are equal: for example, if the true label is "Product A" and the prediction is "NO Product" it is not a big problem, while if the true label is "Product C" the impact of the error is much bigger. Basically, I have to insert this information into the loss function of the algorithm (I am currently using Xg-Boost, Random Forest, ecc).

Any idea on how to implement it on scikit-learn or other ML libraries on Python?

Solution

Suppose this the mapping of your classes:

{'Product A':0, 'Product B':1, 'Product C':2, 'NO Product':3)}

Then from sklearn.ensemble.RandomForestClassifier docs, use class_weight as follows:

rf = RandomForestClassifier(n_estimators = 100, class_weight = {0:1,1:1,2:2,3:1})

This will give more weights to 'Product C'