java text machine-learning opennlp categorization

OpenNLP classifier output

At the moment I'm using the following code to train a classifier model :

    final String iterations = "1000";
    final String cutoff = "0";
    InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));
    ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
    ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);

    TrainingParameters params = new TrainingParameters();
    params.put(TrainingParameters.ITERATIONS_PARAM, iterations);
    params.put(TrainingParameters.CUTOFF_PARAM, cutoff);
    params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);

    DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());

    OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));
    model.serialize(modelOut);

    return model;

This goes well and after every run I get the following output :

    Indexing events with TwoPass using cutoff of 0

    Computing event counts...  done. 1474 events
    Indexing...  done.
Collecting events... Done indexing in 0,03 s.
Incorporating indexed data for training...  
done.
    Number of Event Tokens: 1474
        Number of Outcomes: 2
      Number of Predicates: 4149
Computing model parameters...
Stats: (998/1474) 0.6770691994572592
...done.

Could someone explain what this output means? And if it tells something about the accuracy?

Solution

Looking at the source, we can tell this output is done by NaiveBayesTrainer::trainModel method:

public AbstractModel trainModel(DataIndexer di) {
    // ...
    display("done.\n");
    display("\tNumber of Event Tokens: " + numUniqueEvents + "\n");
    display("\t    Number of Outcomes: " + numOutcomes + "\n");
    display("\t  Number of Predicates: " + numPreds + "\n");
    display("Computing model parameters...\n");
    MutableContext[] finalParameters = findParameters();
    display("...done.\n");
    // ...
}

If you take a look at findParameters() code, you'll notice that it calls the trainingStats() method, which contains the code snippet that calculates the accuracy:

private double trainingStats(EvalParameters evalParams) {
    // ...
    double trainingAccuracy = (double) numCorrect / numEvents;
    display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "\n");
    return trainingAccuracy;
}

TL;DR the Stats: (998/1474) 0.6770691994572592 part of the output is the accuracy you're looking for.