I am using apache mahout for performing sentiment analysis in the customer support domain. Since I am not able to get a proper training data set, I made my own. Now I have 100 support mails for positive sentiment and 100 for negative.
But the problem is, I am not able to achieve accuracy. It stays somewhere around 55%, which is pathetic. Some 70% and around accuracy will be satisfactory. And also note that I am using a complimentary naive bayes classifier of apache mahout.
Coming to the question precisely, is it the smaller data set size that is bringing down the accuracy? If not, where should I tweak?
Only for the benefit of those looking into this question in future, I will share the ways in which I tweaked the accuracy of my classifier from 50 to around 78%
This should dramatically raise your accuracy.