I am training a random forest classifier in R using mlr for binary classification.
My classes are well balanced.
0 1
0.5162791 0.4837209
I've tuned my various model in various ways by modifying the number of trees and mtry.
But I am having trouble picking the right accuracy metrics and determining what the cutoff should be.
Currently I have
tpr.test.mean fpr.test.mean fnr.test.mean fpr.test.mean acc.test.mean mmce.test.mean
0.7908072 0.2872358 0.2091928 0.2872358 0.7531250 0.2468750
f1.test.mean
0.7736447
How can I determine what the ideal cutoff should be for my classes? So far I found 45/55 to work best but is there a better way of doing this? What accuracy metrics are usually the best for binary classifiers?
F1 is usually a safe bet. It does not allow a classifier to "trick" the measure by having 100% recall or 100% precision; because of the harmonic mean both need to increase side-by-side for a good result.
Of course, there are exceptions, such as valuing recall more than precision (e.g. in cancer diagnosis).
So, the metric should reflect what you are ultimately trying to optimize against.