Search code examples
machine-learningclassificationmahoutsentiment-analysis

Sentimental analysis using apache mahout


I am planning to develop a system that would predict the mood of a given text(sentiment analysis in short).

I would also prefer apache mahout because, it is seriously huge data and the system should be scalable realtime. Kindly suggest me algorithms that apache mahout provides, which will be suitable for sentiment analysis.


Solution

  • If you have labeled training data then you could try Naive Bayes classifier which is one of the simplest supervised learning algorithms out there (and is supported by Mahout). If that is not sufficient for some reason then you could try more involved algorithms such as logistic regression etc.

    If you don't have labeled data then you are out of luck - you will need to get some for this to work (e.g. by hiring someone to label your data for you via Amazon's Mechanical Turk)

    By the way, what size of the data are we talking about? (if it is is up to a few hundred of gigabytes then you don't need hadoop/mahout to train this type of models - unless you have that data sitting in hadoop already of course..)