Search code examples
pythonnlpstanford-nlp

What's the difference between the new StanfordNLP native Python package and the python wrapper to Core-NLP?


Can anyone shed some light on the difference between the neural pipeline used in the new native Python StanfordNLP package: https://stanfordnlp.github.io/stanfordnlp/

and the python wrapper to the Java coreNLP package https://stanfordnlp.github.io/CoreNLP/?

Are these two different implementations? I saw that the StanfordNLP package has native neural implementations but also had a wrapper to the CoreNLP package and was wondering why you would need this wrapper if everything was migrated to python anyway?


Solution

  • The two systems are completely distinct. The Python-native neural pipeline roughly corresponds to Universal Dependency Parsing from Scratch, with the Tensorflow parser used there reproduced in PyTorch. It provides a fully neural pipeline for many languages from sentence splitting through dependency parsing, exploiting UD resources, but doesn't (at present) support other things such as NER, coreference, relation extraction, open IE, and hand-written pattern matching, and is only trained on UD resources. CoreNLP, which you can use through this or other Python wrappers, does provide all of these other components for a handful of languages, and some models, including English, are trained on much more data. It has the advantages and disadvantages of many pre-neural components (fast tokenizer! purely heuristic sentence-splitting). Most likely, if you're working with formal English text, you'll currently still do better with CoreNLP. In a bunch of other circumstances, you'll do better with the Python stanfordnlp.