Search code examples
javastanford-nlp

Stanford CoreNLP - the egw4-reut.512.clusters cannot be found


I am using the CoreNLP package to do some annotation on user comments and since I have upgraded to the 3.5.0 version I seem to repeatedly run into the same error:

Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ...

Loading distsim lexicon from /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters ...

java.lang.RuntimeException: java.io.FileNotFoundException: \u\nlp\data\pos_tags_are_useless\egw4-reut.512.clusters (The system cannot find the path specified) at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.setNextObject(ReaderIteratorFactory.java:225) (cue fifty lines of error)

A few searches here got me these similar questions:

Stanford NER Error: Loading distsim lexicon Failed and Stanford NER tagger generates 'file not found' exception with provided models which did not solve my issue: I am exclusively using code and models from the 3.5.0 (via Maven Central). I tried modifying the props file from the NER model and pointing towards another .clusters file in a user directory with no success (exact same error).

The code I use to instantiate the CoreNLP object is pretty standard too, but here it is:

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    stan = new StanfordCoreNLP(props);

Now I am thinking that there is something obvious that I am missing. Any help would be greatly appreciated.

A more complete stacktrace (if that can help) is as follows:

Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... Loading distsim lexicon from /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters ... java.lang.RuntimeException: java.io.FileNotFoundException: \u\nlp\data\pos_tags_are_useless\egw4-reut.512.clusters (The system cannot find the path specified)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.setNextObject(ReaderIteratorFactory.java:225)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.<init>(ReaderIteratorFactory.java:161)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory.iterator(ReaderIteratorFactory.java:98)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:404)
at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:242)
at edu.stanford.nlp.ie.NERFeatureFactory.initLexicon(NERFeatureFactory.java:471)
at edu.stanford.nlp.ie.NERFeatureFactory.init(NERFeatureFactory.java:379)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.reinit(AbstractSequenceClassifier.java:171)
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2630)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1620)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1675)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1662)
at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2851)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:189)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:173)
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:113)
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:64)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$6.create(StanfordCoreNLP.java:617)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:267)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:129)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:125)
at rgu.jclos.quilt.utilities.nlp.DependenciesTagger.<init>(DependenciesTagger.java:99)
at rgu.jclos.quilt.eca.approaches.ApproachC_USS.main(ApproachC_USS.java:47)
Caused by: java.io.FileNotFoundException: \u\nlp\data\pos_tags_are_useless\egw4-reut.512.clusters (The system cannot find the path specified)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:131)
    at edu.stanford.nlp.io.EncodingFileReader.<init>(EncodingFileReader.java:78)
    at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.setNextObject(ReaderIteratorFactory.java:192)
    ... 23 more
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.FileNotFoundException
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$6.create(StanfordCoreNLP.java:621)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:267)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:129)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:125)
    at rgu.jclos.quilt.utilities.nlp.DependenciesTagger.<init>(DependenciesTagger.java:99)
    at rgu.jclos.quilt.eca.approaches.ApproachC_USS.main(ApproachC_USS.java:47)
Caused by: java.io.FileNotFoundException
    at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:199)
    at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:173)
    at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:113)
    at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:64)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$6.create(StanfordCoreNLP.java:617)
    ... 6 more
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to edu.stanford.nlp.classify.LinearClassifier
at edu.stanford.nlp.ie.ner.CMMClassifier.loadClassifier(CMMClassifier.java:1070)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1620)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1675)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1662)
at edu.stanford.nlp.ie.ner.CMMClassifier.getClassifier(CMMClassifier.java:1116)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:195)
... 10 more

Solution

  • It turns out that the problem was caused by Maven.

    My code was located in a utility library which was wrapping over the Stanford CoreNLP to provide additional processing, and was working perfectly well by itself. However when adding this project as a dependency to my master project, Maven defaulted to importing version 3.4 of the Stanford CoreNLP library models to the master project, which caused the bug described above.