Search code examples
stanford-nlp

How to get training data and models of Stanford CoreNLP?


I downloaded Stanford CoreNLP from the official website and GitHub.

In the guides, it is stated

On the Stanford NLP machines, training data is available in /u/nlp/data/depparser/nn/data

or HERE

The list of models currently distributed is:

edu/stanford/nlp/models/parser/nndep/english_UD.gz (default, English, Universal Dependencies)

It may sound a silly question, but I cannot find such files and folders in any distribution.

Where can I find the source data and models officially distributed with Stanford CoreNLP?


Solution

  • We don't distribute most of the CoreNLP training data. Quite a lot of it is non-free, licensed data produced by other people (such as LDC https://www.ldc.upenn.edu/).

    However, a huge number of free dependency treebanks are available through the Universal Dependencies project: https://universaldependencies.org/.

    All the Stanford CoreNLP models are available in the "models" jar files. edu/stanford/nlp/models/parser/nndep/english_UD.gz is in this one: stanford-corenlp-3.9.2-models.jar, which is both in the zip file download http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip or can be found on Maven here: http://central.maven.org/maven2/edu/stanford/nlp/stanford-parser/3.9.2/.