Search code examples
nlpstanford-nlp

StanfordNLP Openie fails


I have StanfordNLP up and running.

My maven dependency structure is as follows:

<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.6.0</version>
</dependency>
<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.6.0</version>
    <classifier>models</classifier>
</dependency>

My code runs just fine as follows:

@Test
public void testTA() throws Exception
{

    Path p = Paths.get("s.txt");

    byte[] encoded = Files.readAllBytes(p);
    String s = new String(encoded);

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse, ner, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = s;

    StringBuffer sb = new StringBuffer();

    sb.append(text);
    sb.append(
            "\n\n\n\n\n\n\n===================================================================\n\n\n\n\n\n\n\n\n\n\n");

    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);

    // run all Annotators on this text
    pipeline.annotate(document);

    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and
    // has values with custom types
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    sb.append(
            "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n+++++++++++++++++++++++SENTENCES++++++++++++++++++++++++++++\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n");
    for (CoreMap sentence : sentences)
    {
        // traversing the words in the current sentence
        // a CoreLabel is a CoreMap with additional token-specific methods
        sb.append("\n\n\n==============SENTENCE==============\n\n\n");
        sb.append(sentence.toString());
        sb.append("\n");
        for (CoreLabel token : sentence.get(TokensAnnotation.class))
        {
            // this is the text of the token
            sb.append("\n==============TOKEN==============\n");
            String word = token.get(TextAnnotation.class);
            sb.append(word);
            sb.append(" : ");
            // this is the POS tag of the token
            String pos = token.get(PartOfSpeechAnnotation.class);
            // this is the NER label of the token
            sb.append(pos);
            sb.append(" : ");
            String lemma = token.get(LemmaAnnotation.class);
            sb.append(lemma);
            sb.append(" : ");
            String ne = token.get(NamedEntityTagAnnotation.class);
            sb.append(ne);
            sb.append("\n");

        }

        // this is the parse tree of the current sentence
        Tree tree = sentence.get(TreeAnnotation.class);
        sb.append("\n\n\n=====================TREE==================\n\n\n");
        sb.append(tree.toString());

        // this is the Stanford dependency graph of the current sentence
        SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
        sb.append("\n\n\n");
        sb.append(dependencies.toString());
    }

However, when I add openie to the pipeline, the code fails.

props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse, ner, dcoref, openie");

The error I get is as follows:

annotator "openie" requires annotator "natlog"

Can anyone advise me on this?


Solution

  • The answer is that annotators in the pipeline can depend on each other. Simply add natlog to the pipeline. Crucially, dependencies must be added first, so

    • natlog must be in the pipeline before openie.
    • depparse must be in the pipeline before natlog

    and as an aside,

    • parse must be in the pipeline before dcoref.