Search code examples
mahout

Apache Mahout Naive Bayes Training size


I am training a data set which has 2 categories using the Naive Bayes algorithm.

I was wondering if the 2 categories need to have an equal amount in each for weightings on words to be distributed well, or is this not necessary?

Thanks


Solution

  • It's not necessary, although each category should have enough samples to avoid overfitting.