I'm trying to get started with Stanford CoreNLP and can't even get past the very first simple example from here.
Here is my code:
package stanford.corenlp;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import com.google.common.io.Files;
import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;
import java.util.logging.Level;
import java.util.logging.Logger;
private void test2() {
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
String text = "Now is the time for all good men to come to the aid of their country.";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
}
public static void main(String[] args) throws IOException {
StanfordNLP nlp = new StanfordNLP();
nlp.test2();
}
}
Here is the stacktrace:
Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:791)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:312)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:265)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:85)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:73)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:55)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$42(StanfordCoreNLP.java:496)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
at stanford.corenlp.StanfordNLP.test2(StanfordNLP.java:93)
at stanford.corenlp.StanfordNLP.main(StanfordNLP.java:108)
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:789)
... 16 more
C:\Users\Greg\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 0 seconds)
What am I missing?
First of all you need to add to the class path stanford-corenlp-3.8.0.jar. That makes the red error marks in NetBeans go away. But you also need to add stanford-corenlp-3.8.0-models.jar to the class path to prevent the error I documented. Adding the folder it resides in to the classpath doesn't work. Details like this should never be left out of beginner documentation!
Now if you continue with the example and add in the new stuff, more errors occur. For example, the code will then look like:
package stanford.corenlp;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import com.google.common.io.Files;
import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.PropertiesUtils;
import java.util.logging.Level;
import java.util.logging.Logger;
private void test2() {
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(
PropertiesUtils.asProperties(
"annotators", "tokenize,ssplit,pos,lemma,parse,natlog",
"ssplit.isOneSentence", "true",
"parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz",
"tokenize.language", "en"));
// read some text in the text variable
String text = "Now is the time for all good men to come to the aid of their country.";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence: sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(PartOfSpeechAnnotation.class);
// this is the NER label of the token
String ne = token.get(NamedEntityTagAnnotation.class);
System.out.println("word="+word +", pos="+pos +", ne="+ne);
}
// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
}
// This is the coreference link graph
// Each chain stores a set of mentions that link to each other,
// along with a method for getting the most representative mention
// Both sentence and token offsets start at 1!
Map<Integer, CorefChain> graph =
document.get(CorefChainAnnotation.class);
}
public static void main(String[] args) throws IOException {
StanfordNLP nlp = new StanfordNLP();
nlp.test2();
}
}
And the stack trace becomes:
run:
Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec].
Adding annotator lemma
Adding annotator parse
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to open "edu/stanford/nlp/models/srparser/englishSR.ser.gz" as class path, filename or URL
at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:187)
at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:219)
at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:121)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.parse(AnnotatorImplementations.java:115)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$50(StanfordCoreNLP.java:504)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
at stanford.corenlp.StanfordNLP.test2(StanfordNLP.java:95)
at stanford.corenlp.StanfordNLP.main(StanfordNLP.java:145)
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/srparser/englishSR.ser.gz" as class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:309)
at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:184)
... 14 more
C:\Users\Greg\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 1 second)
I finally got past this point by downloading and adding to the classpath stanford-english-corenlp-2017-06-09-models.jar which you can get from the "English" download link here:
You need to do this despite the message on the Download page saying that everything needed for English is already provided in the corenlp download!