Search code examples
stanford-nlp

Stanford CoreNLP BasicPipelineExample doesn't work


I'm trying to get started with Stanford CoreNLP and can't even get past the very first simple example from here.

https://stanfordnlp.github.io/CoreNLP/api.html

Here is my code:

package stanford.corenlp;

import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.List;
import java.util.Map;
import java.util.Properties;

import com.google.common.io.Files;

import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;
import java.util.logging.Level;
import java.util.logging.Logger;

    private void test2() {
        // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        // read some text in the text variable
        String text = "Now is the time for all good men to come to the aid of their country.";

        // create an empty Annotation just with the given text
        Annotation document = new Annotation(text);

        // run all Annotators on this text
        pipeline.annotate(document);

    }

  public static void main(String[] args) throws IOException {
      StanfordNLP nlp = new StanfordNLP();
      nlp.test2();
  }

}

Here is the stacktrace:

Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:791)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:312)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:265)
    at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:85)
    at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:73)
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:55)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$42(StanfordCoreNLP.java:496)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
    at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
    at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
    at stanford.corenlp.StanfordNLP.test2(StanfordNLP.java:93)
    at stanford.corenlp.StanfordNLP.main(StanfordNLP.java:108)
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL
    at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:789)
    ... 16 more
C:\Users\Greg\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 0 seconds)

What am I missing?


Solution

  • First of all you need to add to the class path stanford-corenlp-3.8.0.jar. That makes the red error marks in NetBeans go away. But you also need to add stanford-corenlp-3.8.0-models.jar to the class path to prevent the error I documented. Adding the folder it resides in to the classpath doesn't work. Details like this should never be left out of beginner documentation!

    Now if you continue with the example and add in the new stuff, more errors occur. For example, the code will then look like:

    package stanford.corenlp;
    
        import java.io.File;
        import java.io.IOException;
        import java.nio.charset.Charset;
        import java.util.List;
        import java.util.Map;
        import java.util.Properties;
    
        import com.google.common.io.Files;
    
        import edu.stanford.nlp.dcoref.CorefChain;
        import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
        import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
        import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
        import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
        import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
        import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
        import edu.stanford.nlp.ling.CoreLabel;
        import edu.stanford.nlp.pipeline.Annotation;
        import edu.stanford.nlp.pipeline.StanfordCoreNLP;
        import edu.stanford.nlp.semgraph.SemanticGraph;
        import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
        import edu.stanford.nlp.trees.Tree;
        import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
        import edu.stanford.nlp.util.CoreMap;
        import edu.stanford.nlp.util.PropertiesUtils;
        import java.util.logging.Level;
        import java.util.logging.Logger;
    
            private void test2() {
                // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
                Properties props = new Properties();
                props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
                StanfordCoreNLP pipeline = new StanfordCoreNLP(
                PropertiesUtils.asProperties(
                    "annotators", "tokenize,ssplit,pos,lemma,parse,natlog",
                    "ssplit.isOneSentence", "true",
                    "parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz",
                    "tokenize.language", "en"));
    
                // read some text in the text variable
                String text = "Now is the time for all good men to come to the aid of their country.";
    
                // create an empty Annotation just with the given text
                Annotation document = new Annotation(text);
    
                // run all Annotators on this text
                pipeline.annotate(document);
    
                // these are all the sentences in this document
                // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
                List<CoreMap> sentences = document.get(SentencesAnnotation.class);
    
                for (CoreMap sentence: sentences) {
                    // traversing the words in the current sentence
                    // a CoreLabel is a CoreMap with additional token-specific methods
                    for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
                        // this is the text of the token
                        String word = token.get(TextAnnotation.class);
                        // this is the POS tag of the token
                        String pos = token.get(PartOfSpeechAnnotation.class);
                        // this is the NER label of the token
                        String ne = token.get(NamedEntityTagAnnotation.class);
    
                        System.out.println("word="+word +", pos="+pos +", ne="+ne);
                    }
    
                    // this is the parse tree of the current sentence
                    Tree tree = sentence.get(TreeAnnotation.class);
    
                    // this is the Stanford dependency graph of the current sentence
                    SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
                }
    
                // This is the coreference link graph
                // Each chain stores a set of mentions that link to each other,
                // along with a method for getting the most representative mention
                // Both sentence and token offsets start at 1!
                Map<Integer, CorefChain> graph = 
                    document.get(CorefChainAnnotation.class);
            }
    
          public static void main(String[] args) throws IOException {
              StanfordNLP nlp = new StanfordNLP();
              nlp.test2();
          }
    
        }
    

    And the stack trace becomes:

    run:
    Adding annotator tokenize
    Adding annotator ssplit
    Adding annotator pos
    Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec].
    Adding annotator lemma
    Adding annotator parse
    Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to open "edu/stanford/nlp/models/srparser/englishSR.ser.gz" as class path, filename or URL
        at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:187)
        at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:219)
        at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:121)
        at edu.stanford.nlp.pipeline.AnnotatorImplementations.parse(AnnotatorImplementations.java:115)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$50(StanfordCoreNLP.java:504)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
        at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
        at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
        at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
        at stanford.corenlp.StanfordNLP.test2(StanfordNLP.java:95)
        at stanford.corenlp.StanfordNLP.main(StanfordNLP.java:145)
    Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/srparser/englishSR.ser.gz" as class path, filename or URL
        at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
        at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:309)
        at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:184)
        ... 14 more
    C:\Users\Greg\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
    BUILD FAILED (total time: 1 second)
    

    I finally got past this point by downloading and adding to the classpath stanford-english-corenlp-2017-06-09-models.jar which you can get from the "English" download link here:

    https://stanfordnlp.github.io/CoreNLP/download.html

    You need to do this despite the message on the Download page saying that everything needed for English is already provided in the corenlp download!