Search code examples
scikit-learnrandom-forestone-hot-encodinglabel-encoding

Sklearn Random Forrest different accuracy values for different label encodings


I'm using sklearn Random Forrest to train my model. With the same input features for the model I tried passing the target labels first with label_binarize to create one hot encodings of my target labels and second I tried using label_encoder to encode my target labels. In both cases I'm getting different accuracy score. Is there a specific reason why this is happening, as I'm just using a different method to encode the labels without changing any input features.


Solution

  • https://datascience.stackexchange.com/questions/74364/random-forrest-sklearn-gives-different-accuracy-for-different-target-label-encod

    Basically when you encode your target labels as one hot encoding sklearn treats it as a multilabel problem as compared to label encoder which gives an 1d array where sklearn treats it as a multiclass problem.

    https://scikit-learn.org/stable/modules/multiclass.html