I have a multiclass classification problem (eg. the target variable is made by 4 different outcomes: Product A, Product B, Product C and NO Product). Not all the errors are equal: for example, if the true label is "Product A" and the prediction is "NO Product" it is not a big problem, while if the true label is "Product C" the impact of the error is much bigger. Basically, I have to insert this information into the loss function of the algorithm (I am currently using Xg-Boost, Random Forest, ecc).
Any idea on how to implement it on scikit-learn or other ML libraries on Python?
Suppose this the mapping of your classes:
{'Product A':0, 'Product B':1, 'Product C':2, 'NO Product':3)}
Then from sklearn.ensemble.RandomForestClassifier
docs, use class_weight
as follows:
rf = RandomForestClassifier(n_estimators = 100, class_weight = {0:1,1:1,2:2,3:1})
This will give more weights to 'Product C'