Stanford CoreNLP contains several models for parsing English sentences.
There are some comparisons in following papers:
I couldn't find full description and comparison for all models. Does it exist anywhere? If not I think it is worth to create.
I can't give a full list (maybe Chris will chime in?), but my understanding is that these models are:
englishSR
: The shift reduce model trained on various standard treebanks, and some of Stanford's hand-annotated data. This is the fastest and most accurate model we have, but the model is huge to load.
english_SD
: The NN Dependency Parser model for Stanford Dependencies. Deprecated in favor of english_UD
-- the Universal Dependencies model.
english_UD
: The NN Dependency Parser model for Universal Dependencies. This is the fastest and most accurate way to get dependency trees, but it won't give you constituency parses.
englishRNN
: The hybrid PCFG + Neural constituency parser model. More accurate than any of the constituency parsers other than the shift-reduce model, but also noticeably slower.
englishFactored
: Not 100% sure what this is, but my impression is that both accuracy and speed-wise it's between englishPCFG
and englishRNN
.
englishPCFG
: A regular old PCFG model for constituency parsing. Fast to load, and faster than any of the constituency models other than the shift-reduce model, but also kind of mediocre accuracy by modern standards. Nonetheless, a good default.
englishPCFG.caseless
: A caseless version of the PCFG model.
I assume the wsj*
models are there to reproduce numbers in papers (trained on the proper WSJ splits), but again I'm not 100% sure what they are.
To help chose the right model based on speed, accuracy, and the base memory used by the model:
Fast: 10x, accurate, high-memory: englishSR
Medium: 1x, ok accuracy, low-memory: englishPCFG
Slow: ~0.25x, accurate, low-memory: englishRNN
Fast: 100x, accurate, low-memory, dependency parses only: english_UD