I am exploring the amazing python library and I got this:
text='The Titanic managed to sail into the coast intact, and Conan went to Chicago.'
token_pos=[token.pos_ for token in spacy_doc] token_tag=[token.tag_ for token in spacy_doc] token_dep=[token.dep_ for token in spacy_doc]
token_pos
['DET', 'PROPN', 'VERB', 'PART', 'VERB', 'ADP', 'DET', 'NOUN', 'SPACE', 'ADJ', 'PUNCT', 'CCONJ', 'PROPN', 'VERB', 'ADP', 'PROPN', 'PUNCT']
token_tag
['DT', 'NNP', 'VBD', 'TO', 'VB', 'IN', 'DT', 'NN', '_SP', 'JJ', ',', 'CC', 'NNP', 'VBD', 'IN', 'NNP', '.']
token_dep
['det', 'nsubj', 'ROOT', 'aux', 'xcomp', 'prep', 'det', 'pobj', '', 'advcl', 'punct', 'cc', 'nsubj', 'conj', 'prep', 'pobj', 'punct']
Tree
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
else:
return node.orth_
[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
managed
_____________________|_________________________
| | | sail |
| | | _____|__________ |
| | | | | into went
| | | | | | ____|______
| | Titanic | | coast | | to
| | | | | ____|____ | | |
, and The to intact the Conan . Chicago
QUESTIONS: I am puzzled about the dependence relation between "managed" and "went". It is a "conj". (1) Is this a classification error? If it is a classification error, what would be the correct classification? If it is not, can you explain why is this happening? Spacy explains this as a "conjunct": (2) Is there a way to differentiate this case from the case below?
spacy.explain('conj')
Out[59]: 'conjunct'
According to stanford dependence manual:
A conjunct is the relation between two elements connected by a coordinating conjunction, such as “and”, “or”, etc:
“Bill is big and honest”
“They either ski or snowboard”
conj(big, honest)
conj(ski, snowboard)
Look at this last sentence now:
text='They either ski or snowboard.'
spacy_doc = nlp(text)
token_pos=[token.pos_ for token in spacy_doc]
token_tag=[token.tag_ for token in spacy_doc]
token_dep=[token.dep_ for token in spacy_doc]
print(token_pos)
['PRON', 'CCONJ', 'VERB', 'CCONJ', 'NOUN', 'PUNCT']
print(token_tag)
['PRP', 'CC', 'VBP', 'CC', 'NN', '.']
print(token_dep)
['ROOT', 'preconj', 'appos', 'cc', 'conj', 'punct']
[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
They
__________|____
| ski
| __________|______
. either or snowboard
The relation dependence between "ski" and "snowboard" is also "conj" and in this case it seems to be the correct classification.
I think the answer lies within your question itself. "managed" and "went" are two elements connected by a coordinating conjunction, that's what we see in spacy's output as well:
text = 'The Titanic managed to sail into the coast intact, and Conan went to Chicago.'
spacy_doc = nlp(text)
[(token.text, token.dep_) for token in spacy_doc]
Output:
[('The', 'det'),
('Titanic', 'nsubj'),
('managed', 'ROOT'),
('to', 'aux'),
('sail', 'xcomp'),
('into', 'prep'),
('the', 'det'),
('coast', 'pobj'),
(' ', ''),
('intact', 'advmod'),
(',', 'punct'),
('and', 'cc'),
('Conan', 'nsubj'),
('went', 'conj'),
('to', 'prep'),
('Chicago', 'pobj'),
('.', 'punct')]