I'm doing a project on document classification using naive bayes classifier in python. I have used the nltk python module for the same. The docs are from reuters dataset. I performed preprocessing steps such as stemming and stopword elimination and proceeded to compute tf-idf of the index terms. i used these values to train the classifier but the accuracy is very poor(53%). What should I do to improve the accuracy?
A few points that might help:
You may also find alternative weighting techniques such as log(1 + TF) * log(IDF)
will improve accuracy. Good luck!