ClassNotFoundException: edu.stanford.nlp.tagger.maxent.ExtractorNonAlphanumeric

I try to compile and run a Stanford NLP java example on this page: https://stanfordnlp.github.io/CoreNLP/api.html#quickstart-with-convenience-wrappers (the first example BasicPipelineExample)

It is said there that the example is developed for 3.9.0 but I could not get that anywhere, so I'm using 3.9.2.

I run the code under simple scala build tool because the further work will be written in scala.

My build.sbt is here:

name := "nlpexp"

version := "1.0"

scalaVersion := "2.12.10"

resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"

val stanford_corenlpV = "3.9.2"
val AppleJavaExtensionsV = "1.4"
val jollydayV = "0.4.9"
val commons_lang3V = "3.3.1"
val lucene_queryparserV = "4.10.3"
val lucene_analyzers_commonV = "4.10.3"
val lucene_queriesV = "4.10.3"
val lucene_coreV = "4.10.3"
val javax_servlet_apiV = "3.0.1"
val xomV = "1.2.10"
val joda_timeV = "2.9.4"
val ejmlV = "0.23"
val javax_jsonV = "1.0.4"
val slf4j_apiV = "1.7.12"
val protobuf_javaV = "3.2.0"
val junitV = "4.12"
val junit_quickcheck_coreV = "0.5"
val junit_quickcheck_generatorsV = "0.5"
val javax_activation_apiV = "1.2.0"
val jaxb_apiV = "2.4.0-b180830.0359"
val jaxb_coreV = "2.3.0.1"
val jaxb_implV = "2.4.0-b180830.0438"

val logbackVersion = "1.2.3"

libraryDependencies ++= Seq(
  "ch.qos.logback" % "logback-classic" % logbackVersion withSources() withJavadoc(), //
  "ch.qos.logback" % "logback-core" % logbackVersion withSources() withJavadoc(), //
  "ch.qos.logback" % "logback-access" % logbackVersion withSources() withJavadoc(), //

  "com.apple" % "AppleJavaExtensions" % AppleJavaExtensionsV withJavadoc(),
  "de.jollyday" % "jollyday" % jollydayV withSources() withJavadoc(),
  "org.apache.commons" % "commons-lang3" % commons_lang3V withSources() withJavadoc(),
  "org.apache.lucene" % "lucene-queryparser" % lucene_queryparserV withSources() withJavadoc(),
  "org.apache.lucene" % "lucene-analyzers-common" % lucene_analyzers_commonV withSources() withJavadoc(),
  "org.apache.lucene" % "lucene-queries" % lucene_queriesV withSources() withJavadoc(),
  "org.apache.lucene" % "lucene-core" % lucene_coreV withSources() withJavadoc(),
  "javax.servlet" % "javax.servlet-api" % javax_servlet_apiV withSources() withJavadoc(),
  "com.io7m.xom" % "xom" % xomV withSources() withJavadoc(),
  "joda-time" % "joda-time" % joda_timeV withSources() withJavadoc(),
  "com.googlecode.efficient-java-matrix-library" % "ejml" % ejmlV withSources() withJavadoc(),
  "org.glassfish" % "javax.json" % javax_jsonV withSources() withJavadoc(),
  "org.slf4j" % "slf4j-api" % slf4j_apiV withSources() withJavadoc(),
  "com.google.protobuf" % "protobuf-java" % protobuf_javaV withSources() withJavadoc(),
  "junit" % "junit" % junitV  % "test" withSources() withJavadoc(),
  "com.pholser" % "junit-quickcheck-core" % junit_quickcheck_coreV % "test" withSources() withJavadoc(),
  "com.pholser" % "junit-quickcheck-generators" % junit_quickcheck_generatorsV % "test" withSources() withJavadoc(),
  "javax.activation" % "javax.activation-api" % javax_activation_apiV withSources() withJavadoc(),
  "javax.xml.bind" % "jaxb-api" % jaxb_apiV withSources() withJavadoc(),
  "com.sun.xml.bind" % "jaxb-core" % jaxb_coreV withSources() withJavadoc(),
  "com.sun.xml.bind" % "jaxb-impl" % jaxb_implV withSources() withJavadoc(),
"edu.stanford.nlp" % "stanford-corenlp" % stanford_corenlpV withSources() withJavadoc()
)

scalacOptions += "-deprecation"

I downloaded also following jars:

-rw-rw-r-- 1 jk jk  455928708 stanford-corenlp-models-current.jar
-rw-rw-r-- 1 jk jk 1238400073 stanford-english-corenlp-models-current.jar
-rw-rw-r-- 1 jk jk  473999694 stanford-english-kbp-corenlp-models-current.jar

Which are in the dir of unmanaged libraries so that build can find them.

Example code compiles OK and starts to run but fails when trying to load pos annotator.

Output is this:

[info] Running nlpexp.BasicPipelineExample 
Starting
Create pipeline
23:30:09.569 [run-main-0] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
23:30:09.587 [run-main-0] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
23:30:09.592 [run-main-0] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[error] (run-main-0) edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
[error] edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:925)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:823)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:797)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:320)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:273)
[error]     at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:85)
[error]     at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:73)
[error]     at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:53)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$3(StanfordCoreNLP.java:521)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$30(StanfordCoreNLP.java:602)
[error]     at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126)
[error]     at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
[error]     at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:251)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:192)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:188)
[error]     at nlpexp.BasicPipelineExample.main(BasicPipelineExample.java:31)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]     at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.tagger.maxent.ExtractorNonAlphanumeric
[error]     at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
[error]     at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
[error]     at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
[error]     at java.lang.Class.forName0(Native Method)
[error]     at java.lang.Class.forName(Class.java:348)
[error]     at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:720)
[error]     at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1923)
[error]     at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1806)
[error]     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2097)
[error]     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
[error]     at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2030)
[error]     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1613)
[error]     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2342)
[error]     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2266)
[error]     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2124)
[error]     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
[error]     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
[error]     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readExtractors(MaxentTagger.java:628)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:874)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:823)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:797)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:320)
[error]     at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:273)
[error]     at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:85)
[error]     at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:73)
[error]     at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:53)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$3(StanfordCoreNLP.java:521)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$30(StanfordCoreNLP.java:602)
[error]     at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126)
[error]     at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
[error]     at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:251)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:192)
[error]     at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:188)
[error]     at nlpexp.BasicPipelineExample.main(BasicPipelineExample.java:31)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]     at java.lang.reflect.Method.invoke(Method.java:498)
[error] Nonzero exit code: 1
[error] (Compile / run) Nonzero exit code: 1
[error] Total time: 41 s, completed Feb 22, 2020 11:30:10 PM
sbt:nlpexp>

When I remove all but tokenize and ssplit annotators and run the shortened example:

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = new Properties();
    // set the list of annotators to run
    props.setProperty("annotators", "tokenize,ssplit");
    // set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
    props.setProperty("coref.algorithm", "neural");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = new CoreDocument(text);
    // annnotate the document
    pipeline.annotate(document);
    // examples

    // 10th token of the document
    CoreLabel token = document.tokens().get(10);
    System.out.println("Example: token");
    System.out.println(token);
    System.out.println();

    // text of the first sentence
    String sentenceText = document.sentences().get(0).text();
    System.out.println("Example: sentence");
    System.out.println(sentenceText);
    System.out.println();

    // second sentence
    CoreSentence sentence = document.sentences().get(1);
  }

output is:

sbt:nlpexp> run
Starting
Create pipeline
23:45:47.255 [run-main-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
23:45:47.271 [run-main-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
Create doc
Annotate doc
Example: token
he-4

Example: sentence
Joe Smith was born in California.

So it seems that tokenize and ssplit annotators load and run OK, but maybe pos fail to load.

Do I still miss one or more jars or what is the reason for this error here?

Thank you for your support!

Solution

I faced the same problem, just figured it out finally. The models with links on the main repo page at: https://github.com/stanfordnlp/CoreNLP, e.g., this one: http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar, and the neighboring links are matched with the latest current code (i.e., the HEAD of the Git repo), and not any specific release like 3.9.2.

To get the models for the 3.9.2 version, you have to get it from the corresponding models.jar:

<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.9.2</version>
<classifier>models</classifier>
</dependency>

If you don't want to include this (really fat) jar in your build, one option is to first build with this dependency included, then find where the jar gets installed in your local ~/.m2 repo (~/.m2/repositories/edu/stanford/nlp/stanford-corenlp/3.9.2), extract (via jar xvf) the models you want from this jar, include the models in your build (e.g., by placing them under main/resources), comment out or remove the above dependency from your pom, and rebuild.