Search code examples
rclassificationrandom-forestmlr

Random forest cutoff and accuracy metrics for binary classification in R


I am training a random forest classifier in R using mlr for binary classification.

My classes are well balanced.

      0         1 
0.5162791 0.4837209 

I've tuned my various model in various ways by modifying the number of trees and mtry.

But I am having trouble picking the right accuracy metrics and determining what the cutoff should be.

Currently I have

tpr.test.mean  fpr.test.mean  fnr.test.mean  fpr.test.mean   acc.test.mean mmce.test.mean 
 0.7908072      0.2872358      0.2091928      0.2872358      0.7531250      0.2468750 

f1.test.mean 
0.7736447 

How can I determine what the ideal cutoff should be for my classes? So far I found 45/55 to work best but is there a better way of doing this? What accuracy metrics are usually the best for binary classifiers?


Solution

  • F1 is usually a safe bet. It does not allow a classifier to "trick" the measure by having 100% recall or 100% precision; because of the harmonic mean both need to increase side-by-side for a good result.

    Of course, there are exceptions, such as valuing recall more than precision (e.g. in cancer diagnosis).

    So, the metric should reflect what you are ultimately trying to optimize against.