my environment: mahout:0.7 hadoop:1.0.3
what I have done: installed the mahout, and tested the naive bayes examples - 20newsgroup, and it works perfect.
what I want to achieve: there's trainnb and testnb job, however, in the real application, we need the interface like this:
input: [text to be classified] [model to use] output: the class label list(sorted by the probability)
what I have tried: firstly I tried to do as the testnb job, but I don't know how to transform the text into a 'VectorWritable' object which is handled by the StandardNaiveBayesClassifier or ComplementaryNaiveBayesClassifier.
Code:
SequenceFile.Writer writer = new SequenceFile.Writer
(fs, getConf(), getOutputPath(), Text.class, VectorWritable.class);
Path inputFile = new Path(getOption("if"));
Reader reader = new Reader(fs, getInputPath(), getConf());
Text key = new Text();
VectorWritable vw = new VectorWritable();
while (reader.next(key, vw)) {
writer.append(
new Text(SLASH.split(key.toString())[1]),
new VectorWritable(classifier.classifyFull(vw.get()))
);
}
any help will be appreciated!
think I figured it out, in the 'mahout in action' the chapter 16 has some example code.
did you checked this article
I went through the tutorial and all worked fine