Search code examples
nlpstanford-nlptext-classification

Test Maximum Entropy classifier


Is it possible to classify new data trough the Stanford Maximum Entropy classifier WITHOUT creating an external file including all the features?

In other words i have a test file in the following format:

token1 \t feature1_1 \t ... \t feature1_N \t goldLabel1

...

tokenM \t featureM_1 \t ... \t featureM_N \t goldLabelM

I was wondering if it is possible to use a data structure to include test data without creating an external file.


Solution

  • If you review this method (line 409 in ColumnDataClassifier)

    private Pair<GeneralDataset<String,String>, List<String[]>> readDataset(String filename, boolean inTestPhase) {
    

    you can see how the code goes from a file path to a Pair<GeneralDataset<String,String>, List<String[]>>

    That is the key data object needed for evaluation.

    If you review this method (line 2158 in ColumnDataClassifier) you can see how the evaluation is done

    public Pair<Double, Double> testClassifier(String testFile) {

    If you review the main() method (line 2011) you will see an example of the ColumnDataClassifier being built.

    By looking at these three methods you can write additional code to do what you want to do and avoid writing to disk.