Search code examples
mahout

How do I use the mahout naive bayes model in the real project


my environment: mahout:0.7 hadoop:1.0.3

what I have done: installed the mahout, and tested the naive bayes examples - 20newsgroup, and it works perfect.

what I want to achieve: there's trainnb and testnb job, however, in the real application, we need the interface like this:

input: [text to be classified] [model to use] output: the class label list(sorted by the probability)

what I have tried: firstly I tried to do as the testnb job, but I don't know how to transform the text into a 'VectorWritable' object which is handled by the StandardNaiveBayesClassifier or ComplementaryNaiveBayesClassifier.

Code:

SequenceFile.Writer writer = new SequenceFile.Writer
 (fs, getConf(), getOutputPath(), Text.class, VectorWritable.class);
Path inputFile = new Path(getOption("if"));
Reader reader = new Reader(fs, getInputPath(), getConf());
Text key = new Text();
  VectorWritable vw = new VectorWritable();
  while (reader.next(key, vw)) {
    writer.append(
            new Text(SLASH.split(key.toString())[1]),
            new VectorWritable(classifier.classifyFull(vw.get()))
            );
  }

any help will be appreciated!

think I figured it out, in the 'mahout in action' the chapter 16 has some example code.


Solution

  • did you checked this article

    http://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/

    I went through the tutorial and all worked fine