Search code examples
nlpspacydependency-parsing

nlp: is this dependence tag correct? What does exactly it mean in this situation?


I am exploring the amazing python library and I got this:

text='The Titanic managed to sail into the coast  intact, and Conan went to Chicago.'

token_pos=[token.pos_ for token in spacy_doc] token_tag=[token.tag_ for token in spacy_doc] token_dep=[token.dep_ for token in spacy_doc]

token_pos

['DET', 'PROPN', 'VERB', 'PART', 'VERB', 'ADP', 'DET', 'NOUN', 'SPACE', 'ADJ', 'PUNCT', 'CCONJ', 'PROPN', 'VERB', 'ADP', 'PROPN', 'PUNCT']

token_tag

['DT', 'NNP', 'VBD', 'TO', 'VB', 'IN', 'DT', 'NN', '_SP', 'JJ', ',', 'CC', 'NNP', 'VBD', 'IN', 'NNP', '.']

token_dep

['det', 'nsubj', 'ROOT', 'aux', 'xcomp', 'prep', 'det', 'pobj', '', 'advcl', 'punct', 'cc', 'nsubj', 'conj', 'prep', 'pobj', 'punct']

Tree

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_

[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]

                    managed                                 
  _____________________|_________________________            
 |   |     |          sail                       |          
 |   |     |      _____|__________               |           
 |   |     |     |     |         into           went        
 |   |     |     |     |          |          ____|______     
 |   |  Titanic  |     |        coast       |    |      to  
 |   |     |     |     |      ____|____     |    |      |    
 ,  and   The    to  intact the           Conan  .   Chicago

QUESTIONS: I am puzzled about the dependence relation between "managed" and "went". It is a "conj". (1) Is this a classification error? If it is a classification error, what would be the correct classification? If it is not, can you explain why is this happening? Spacy explains this as a "conjunct": (2) Is there a way to differentiate this case from the case below?

spacy.explain('conj')
Out[59]: 'conjunct'

According to stanford dependence manual:

A conjunct is the relation between two elements connected by a coordinating conjunction, such as “and”, “or”, etc:

“Bill is big and honest”

“They either ski or snowboard”

conj(big, honest)

conj(ski, snowboard)

Look at this last sentence now:

text='They either ski or snowboard.'

spacy_doc = nlp(text)

token_pos=[token.pos_ for token in spacy_doc]
token_tag=[token.tag_ for token in spacy_doc]
token_dep=[token.dep_ for token in spacy_doc]

print(token_pos)
['PRON', 'CCONJ', 'VERB', 'CCONJ', 'NOUN', 'PUNCT']

print(token_tag)
['PRP', 'CC', 'VBP', 'CC', 'NN', '.']

print(token_dep)
['ROOT', 'preconj', 'appos', 'cc', 'conj', 'punct']

[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
           They              
  __________|____             
 |              ski          
 |     __________|______      
 .  either       or snowboard

The relation dependence between "ski" and "snowboard" is also "conj" and in this case it seems to be the correct classification.


Solution

  • I think the answer lies within your question itself. "managed" and "went" are two elements connected by a coordinating conjunction, that's what we see in spacy's output as well:

    text = 'The Titanic managed to sail into the coast  intact, and Conan went to Chicago.'
    
    spacy_doc = nlp(text)
    [(token.text, token.dep_) for token in spacy_doc]
    

    Output:

    [('The', 'det'),
     ('Titanic', 'nsubj'),
     ('managed', 'ROOT'),
     ('to', 'aux'),
     ('sail', 'xcomp'),
     ('into', 'prep'),
     ('the', 'det'),
     ('coast', 'pobj'),
     (' ', ''),
     ('intact', 'advmod'),
     (',', 'punct'),
     ('and', 'cc'),
     ('Conan', 'nsubj'),
     ('went', 'conj'),
     ('to', 'prep'),
     ('Chicago', 'pobj'),
     ('.', 'punct')]