Search code examples
javastanford-nlp

How to pass a String to AbstractSequenceClassifier.classifyAndWriteAnswersKBest in CoreNLP?


AbstractSequenceClassifier.classifyAndWriteAnswersKBest allows to pass a filename and an ObjectBank<List<IN>>, but it's unclear from ObjectBank's doc how to create such an ObjectBank without involving a file.

I'm using CoreNLP 3.7.0 with Java 8.


Solution

  • You should just use this method instead:

    Counter<List<IN>> classifyKBest(List<IN> doc, Class<? extends CoreAnnotation<String>> answerField, int k)
    

    It will return a mapping of returned sequences to scores.

    With this line of code you can turn that counter into a sorted list of sequences:

    List<List<IN>> sorted = Counters.toSortedList(kBest);
    

    I'm not sure exactly what you're trying to do, but typically IN is a CoreLabel. The key thing here is to turn your String into a list of IN's. This should be a CoreLabel, but I don't know the full details of the AbstractSequenceClassifier you are working with.

    If you want to run your sequence classifier on a sentence, you could first tokenize it with a pipeline and then pass the list of tokens to classifyKBest(...)

    For instance if in your example you are trying to get the k-best named entity tags:

    // set up pipeline
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize");
    StanfordCoreNLP tokenizerPipeline = new StanfordCoreNLP(props);
    
    // get list of tokens for example sentence
    String exampleSentence = "...";
    // wrap sentence in an Annotation object
    Annotation annotation = new Annotation(exampleSentence);
    // tokenize sentence
    tokenizerPipeline.annotate(annotation);
    // get the list of tokens
    List<CoreLabel> tokens = annotation.get(CoreAnnotations.TokensAnnotation.class);
    
    //...
    // classifier should be an AbstractSequenceClassifier
    
    // get the k best sequences from your abstract sequence classifier
    Counter<List<CoreLabel>> kBestSequences = classifier.classifyKBest(tokens,CoreAnnotations.NamedEntityTagAnnotation.class, 10)
    // sort the k-best examples
    List<List<CoreLabel>> sortedKBest = Counters.toSortedList(kBestSequences);
    // example: getting the second best list
    List<CoreLabel> secondBest = sortedKBest.get(1);
    // example: print out the tags for the second best list
    System.out.println(secondBest.stream().map(token->token.get(CoreAnnotations.NamedEntityTagAnnotation.class)).collect(Collectors.joining(" ")));
    // example print out the score for the second best list
    System.out.println(kBestSequences.getCount(secondBest));
    

    If you have more questions please let me know and I can help out!