I am trying to create, add and get results from a pattern using spacy DependencyMatcher.
I created a pattern for the sentence: "From Monday to Friday"
The full pattern:
pattern = [
{
"RIGHT_ID": "node0",
"RIGHT_ATTRS": {'DEP': 'ROOT', 'POS': 'ADP', 'TAG': 'IN'}
},
{
"LEFT_ID": "node0",
"REL_OP": ">",
"RIGHT_ID": "node1",
"RIGHT_ATTRS": {'DEP': 'pobj', 'POS': 'PROPN', 'TAG': 'NNP'},
},
{
"LEFT_ID": "node1",
"REL_OP": "$--",
"RIGHT_ID": "node2",
"RIGHT_ATTRS": {'DEP': 'prep', 'POS': 'ADP', 'TAG': 'IN'},
},
{
"LEFT_ID": "node2",
"REL_OP": ">",
"RIGHT_ID": "node3",
"RIGHT_ATTRS":{'DEP': 'pobj', 'POS': 'PROPN', 'TAG': 'NNP'},
},
]
The simpler pattern is :
pattern = [
{
"RIGHT_ID": "node0",
"RIGHT_ATTRS": {"POS": "ADP"}
},
{
"LEFT_ID": "node0",
"REL_OP": ">",
"RIGHT_ID": "node1",
"RIGHT_ATTRS": {"POS": "PROPN"},
},
{
"LEFT_ID": "node1",
"REL_OP": "$--",
"RIGHT_ID": "node2",
"RIGHT_ATTRS": {"POS": "ADP"},
},
{
"LEFT_ID": "node2",
"REL_OP": ">",
"RIGHT_ID": "node3",
"RIGHT_ATTRS":{'POS': 'PROPN'},
},
]
My question is, why is this pattern not giving any matches, not on the full or simpler pattern?
import spacy
from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")
matcher = DependencyMatcher(nlp.vocab)
text="From monday to friday"
doc = nlp(text)
matcher.add("pattern1", [pattern])
matches = matcher(doc)
# Each token_id corresponds to one pattern dict
match_id, token_ids = matches[0]
spacy versions:
spaCy v3.0.6
NAME SPACY VERSION
en_core_web_sm >=3.0.0,<3.1.0 3.0.0 ✔
Your REL_OP
for node2
is backwards. It should be $++
.
To give a full explanation, this code works for me.
import spacy
from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")
matcher = DependencyMatcher(nlp.vocab)
text="From Monday to Friday"
doc = nlp(text)
pattern = [
{
"RIGHT_ID": "node0",
"RIGHT_ATTRS": {'POS': 'ADP', 'TAG': 'IN'}
},
{
"LEFT_ID": "node0",
"REL_OP": ">",
"RIGHT_ID": "node1",
"RIGHT_ATTRS": {'POS': 'PROPN'},
},
{
"LEFT_ID": "node1",
"REL_OP": "$++",
"RIGHT_ID": "node2",
"RIGHT_ATTRS": {'POS': 'ADP'},
},
{
"LEFT_ID": "node2",
"REL_OP": ">",
"RIGHT_ID": "node3",
"RIGHT_ATTRS":{'POS': 'PROPN'},
},
]
matcher.add("pattern1", [pattern])
matches = matcher(doc)
print(matches)
print("-----")
# this part is just for reference
for word in doc:
print(word.pos_, word.tag_, word.dep_, word, sep="\t")
Couple of points about this: