Search code examples
nlpstanford-nlpspacydependency-parsing

StanfordNLP, CoreNLP, spaCy - different dependency graphs


I'm trying to use simple rules/patterns defined over a dependency graph to extract very basic informations from sentences (e.g., triples such as subject->predicate->object). I started using StanfordNLP since it was easy to set up and utlizes the GPU for better performance. However, I've noticed that for some sentences, the resulting dependency graph looked not as I would have expected -- I'm no expert though. I therefore tried two other solutions: spaCy and Stanford CoreNLP (I understand that these are maintained by different groups?)

For the example sentence "Tom made Sam believe that Alice has cancer." I've printed the dependencies for all three approaches. CoreNLP and spaCy yield the same dependencies, and they are different from the ones of StanfordNLP. Hence, I'm inclined to swich to CoreNLP and spaCy (another advantage would be that they come with NER out of the box).

Does anyone have some more experience or feedback that would help where to go from here? I don't expect that CoreNLP and spaCy will always yield in the same dependency graphs, but in the example sentence, considering Sam as obj as StandfordNLP is doing compared to being nsubj (CoreNLP, spaCy) seems to be a significant difference

Format:
token   dependency_tag   parent_token

StanfordNLP
Tom     nsubj   made
made    ROOT    ROOT
Sam     obj     made
believe ccomp   made
that    mark    has
Alice   nsubj   has
has     ccomp   believe
cancer  obj     has
.       punct   made

CoreNLP
Tom     nsubj   made
made    ROOT    ROOT
Sam     nsubj   believe
believe ccomp   made
that    mark    has
Alice   nsubj   has
has     ccomp   believe
cancer  dobj    has
.       punct   made

spaCy
Tom     nsubj   made
made    ROOT    ROOT
Sam     nsubj   believe
believe ccomp   made
that    mark    has
Alice   nsubj   has
has     ccomp   believe
cancer  dobj    has
.       punct   made

Solution

  • Not sure how to address your questions but I'd recommend you carefully read the documentation for the Stanford CoreNLP: https://nlp.stanford.edu/software/lex-parser.shtml

    Within the package there are several grammatical and dependency parsers that you can use. Just looking at the grammatical parses, there is an option to retrieve k-best parses and if you process dependencies on them you will most likely get different dependencies for each.

    This has to do both with inaccuracies in the parser and ambiguities in natural language.