Search code examples
pythonparsingnlpstanford-nlp

Stanford typed dependencies using coreNLP in python


In Stanford Dependency Manual they mention "Stanford typed dependencies" and particularly the type "neg" - negation modifier. It is also available when using Stanford enhanced++ parser using the website. for example, the sentence:

"Barack Obama was not born in Hawaii"

enter image description here

The parser indeed find neg(born,not)

but when I'm using the stanfordnlp python library, the only dependency parser I can get will parse the sentence as follow:

('Barack', '5', 'nsubj:pass')

('Obama', '1', 'flat')

('was', '5', 'aux:pass')

('not', '5', 'advmod')

('born', '0', 'root')

('in', '7', 'case')

('Hawaii', '5', 'obl')

and the code that generates it:

import stanfordnlp
stanfordnlp.download('en')  
nlp = stanfordnlp.Pipeline()
doc = nlp("Barack Obama was not born in Hawaii")
a  = doc.sentences[0]
a.print_dependencies()

Is there a way to get similar results to the enhanced dependency parser or any other Stanford parser that result in typed dependencies that will give me the negation modifier?


Solution

  • It is to note the python library stanfordnlp is not just a python wrapper for StanfordCoreNLP.

    1. Difference StanfordNLP / CoreNLP

    As said on the stanfordnlp Github repo:

    The Stanford NLP Group's official Python NLP library. It contains packages for running our latest fully neural pipeline from the CoNLL 2018 Shared Task and for accessing the Java Stanford CoreNLP server.

    Stanfordnlp contains a new set of neural networks models, trained on the CONLL 2018 shared task. The online parser is based on the CoreNLP 3.9.2 java library. Those are two different pipelines and sets of models, as explained here.

    Your code only accesses their neural pipeline trained on CONLL 2018 data. This explains the differences you saw compared to the online version. Those are basically two different models.

    What adds to the confusion I believe is that both repositories belong to the user named stanfordnlp (which is the team name). Don't be fooled between the java stanfordnlp/CoreNLP and the python stanfordnlp/stanfordnlp.

    Concerning your 'neg' issue, it seems that in the python libabry stanfordnlp, they decided to consider the negation with an 'advmod' annotation altogether. At least that is what I ran into for a few example sentences.

    2. Using CoreNLP via stanfordnlp package

    However, you can still get access to the CoreNLP through the stanfordnlp package. It requires a few more steps, though. Citing the Github repo,

    There are a few initial setup steps.

    • Download Stanford CoreNLP and models for the language you wish to use. (you can download CoreNLP and the language models here)
    • Put the model jars in the distribution folder
    • Tell the python code where Stanford CoreNLP is located: export CORENLP_HOME=/path/to/stanford-corenlp-full-2018-10-05

    Once that is done, you can start a client, with code that can be found in the demo :

    from stanfordnlp.server import CoreNLPClient 
    
    with CoreNLPClient(annotators=['tokenize','ssplit','pos','depparse'], timeout=60000, memory='16G') as client:
        # submit the request to the server
        ann = client.annotate(text)
    
        # get the first sentence
        sentence = ann.sentence[0]
    
        # get the dependency parse of the first sentence
        print('---')
        print('dependency parse of first sentence')
        dependency_parse = sentence.basicDependencies
        print(dependency_parse)
    
        #get the tokens of the first sentence
        #note that 1 token is 1 node in the parse tree, nodes start at 1
        print('---')
        print('Tokens of first sentence')
        for token in sentence.token :
            print(token)
    

    Your sentence will therefore be parsed if you specify the 'depparse' annotator (as well as the prerequisite annotators tokenize, ssplit, and pos). Reading the demo, it feels that we can only access basicDependencies. I have not managed to make Enhanced++ dependencies work via stanfordnlp.

    But the negations will still appear if you use basicDependencies !

    Here is the output I obtained using stanfordnlp and your example sentence. It is a DependencyGraph object, not pretty, but it is unfortunately always the case when we use the very deep CoreNLP tools. You will see that between nodes 4 and 5 ('not' and 'born'), there is and edge 'neg'.

    node {
      sentenceIndex: 0
      index: 1
    }
    node {
      sentenceIndex: 0
      index: 2
    }
    node {
      sentenceIndex: 0
      index: 3
    }
    node {
      sentenceIndex: 0
      index: 4
    }
    node {
      sentenceIndex: 0
      index: 5
    }
    node {
      sentenceIndex: 0
      index: 6
    }
    node {
      sentenceIndex: 0
      index: 7
    }
    node {
      sentenceIndex: 0
      index: 8
    }
    edge {
      source: 2
      target: 1
      dep: "compound"
      isExtra: false
      sourceCopy: 0
      targetCopy: 0
      language: UniversalEnglish
    }
    edge {
      source: 5
      target: 2
      dep: "nsubjpass"
      isExtra: false
      sourceCopy: 0
      targetCopy: 0
      language: UniversalEnglish
    }
    edge {
      source: 5
      target: 3
      dep: "auxpass"
      isExtra: false
      sourceCopy: 0
      targetCopy: 0
      language: UniversalEnglish
    }
    edge {
      source: 5
      target: 4
      dep: "neg"
      isExtra: false
      sourceCopy: 0
      targetCopy: 0
      language: UniversalEnglish
    }
    edge {
      source: 5
      target: 7
      dep: "nmod"
      isExtra: false
      sourceCopy: 0
      targetCopy: 0
      language: UniversalEnglish
    }
    edge {
      source: 5
      target: 8
      dep: "punct"
      isExtra: false
      sourceCopy: 0
      targetCopy: 0
      language: UniversalEnglish
    }
    edge {
      source: 7
      target: 6
      dep: "case"
      isExtra: false
      sourceCopy: 0
      targetCopy: 0
      language: UniversalEnglish
    }
    root: 5
    
    ---
    Tokens of first sentence
    word: "Barack"
    pos: "NNP"
    value: "Barack"
    before: ""
    after: " "
    originalText: "Barack"
    beginChar: 0
    endChar: 6
    tokenBeginIndex: 0
    tokenEndIndex: 1
    hasXmlContext: false
    isNewline: false
    
    word: "Obama"
    pos: "NNP"
    value: "Obama"
    before: " "
    after: " "
    originalText: "Obama"
    beginChar: 7
    endChar: 12
    tokenBeginIndex: 1
    tokenEndIndex: 2
    hasXmlContext: false
    isNewline: false
    
    word: "was"
    pos: "VBD"
    value: "was"
    before: " "
    after: " "
    originalText: "was"
    beginChar: 13
    endChar: 16
    tokenBeginIndex: 2
    tokenEndIndex: 3
    hasXmlContext: false
    isNewline: false
    
    word: "not"
    pos: "RB"
    value: "not"
    before: " "
    after: " "
    originalText: "not"
    beginChar: 17
    endChar: 20
    tokenBeginIndex: 3
    tokenEndIndex: 4
    hasXmlContext: false
    isNewline: false
    
    word: "born"
    pos: "VBN"
    value: "born"
    before: " "
    after: " "
    originalText: "born"
    beginChar: 21
    endChar: 25
    tokenBeginIndex: 4
    tokenEndIndex: 5
    hasXmlContext: false
    isNewline: false
    
    word: "in"
    pos: "IN"
    value: "in"
    before: " "
    after: " "
    originalText: "in"
    beginChar: 26
    endChar: 28
    tokenBeginIndex: 5
    tokenEndIndex: 6
    hasXmlContext: false
    isNewline: false
    
    word: "Hawaii"
    pos: "NNP"
    value: "Hawaii"
    before: " "
    after: ""
    originalText: "Hawaii"
    beginChar: 29
    endChar: 35
    tokenBeginIndex: 6
    tokenEndIndex: 7
    hasXmlContext: false
    isNewline: false
    
    word: "."
    pos: "."
    value: "."
    before: ""
    after: ""
    originalText: "."
    beginChar: 35
    endChar: 36
    tokenBeginIndex: 7
    tokenEndIndex: 8
    hasXmlContext: false
    isNewline: false
    

    2. Using CoreNLP via NLTK package

    I will not go into details on this one, but there is also a solution to access the CoreNLP server via the NLTK library , if all else fails. It does output the negations, but requires a little more work to start the servers. Details on this page

    EDIT

    I figured I could also share with you the code to get the DependencyGraph into a nice list of 'dependency, argument1, argument2' in a shape similar to what stanfordnlp outputs.

    from stanfordnlp.server import CoreNLPClient
    
    text = "Barack Obama was not born in Hawaii."
    
    # set up the client
    with CoreNLPClient(annotators=['tokenize','ssplit','pos','depparse'], timeout=60000, memory='16G') as client:
        # submit the request to the server
        ann = client.annotate(text)
    
        # get the first sentence
        sentence = ann.sentence[0]
    
        # get the dependency parse of the first sentence
        dependency_parse = sentence.basicDependencies
    
        #print(dir(sentence.token[0])) #to find all the attributes and methods of a Token object
        #print(dir(dependency_parse)) #to find all the attributes and methods of a DependencyGraph object
        #print(dir(dependency_parse.edge))
    
        #get a dictionary associating each token/node with its label
        token_dict = {}
        for i in range(0, len(sentence.token)) :
            token_dict[sentence.token[i].tokenEndIndex] = sentence.token[i].word
    
        #get a list of the dependencies with the words they connect
        list_dep=[]
        for i in range(0, len(dependency_parse.edge)):
    
            source_node = dependency_parse.edge[i].source
            source_name = token_dict[source_node]
    
            target_node = dependency_parse.edge[i].target
            target_name = token_dict[target_node]
    
            dep = dependency_parse.edge[i].dep
    
            list_dep.append((dep, 
                str(source_node)+'-'+source_name, 
                str(target_node)+'-'+target_name))
        print(list_dep)
    

    It ouputs the following

    [('compound', '2-Obama', '1-Barack'), ('nsubjpass', '5-born', '2-Obama'), ('auxpass', '5-born', '3-was'), ('neg', '5-born', '4-not'), ('nmod', '5-born', '7-Hawaii'), ('punct', '5-born', '8-.'), ('case', '7-Hawaii', '6-in')]