Search code examples

Spacy dependencymatcher pattern not returning matches

I am trying to create, add and get results from a pattern using spacy DependencyMatcher.

I created a pattern for the sentence: "From Monday to Friday"

The full pattern:

pattern = [
        "RIGHT_ID": "node0",
        "RIGHT_ATTRS": {'DEP': 'ROOT', 'POS': 'ADP', 'TAG': 'IN'}
        "LEFT_ID": "node0",
        "REL_OP": ">",
        "RIGHT_ID": "node1",
        "RIGHT_ATTRS": {'DEP': 'pobj', 'POS': 'PROPN', 'TAG': 'NNP'},
        "LEFT_ID": "node1",
        "REL_OP": "$--",
        "RIGHT_ID": "node2",
        "RIGHT_ATTRS": {'DEP': 'prep', 'POS': 'ADP', 'TAG': 'IN'},
        "LEFT_ID": "node2",
        "REL_OP": ">",
        "RIGHT_ID": "node3",
        "RIGHT_ATTRS":{'DEP': 'pobj', 'POS': 'PROPN', 'TAG': 'NNP'},

The simpler pattern is :

pattern = [
        "RIGHT_ID": "node0",
        "RIGHT_ATTRS": {"POS": "ADP"}
        "LEFT_ID": "node0",
        "REL_OP": ">",
        "RIGHT_ID": "node1",
        "RIGHT_ATTRS": {"POS": "PROPN"},
        "LEFT_ID": "node1",
        "REL_OP": "$--",
        "RIGHT_ID": "node2",
        "RIGHT_ATTRS": {"POS": "ADP"},
        "LEFT_ID": "node2",
        "REL_OP": ">",
        "RIGHT_ID": "node3",
        "RIGHT_ATTRS":{'POS': 'PROPN'},

enter image description here

My question is, why is this pattern not giving any matches, not on the full or simpler pattern?

import spacy
from spacy.matcher import DependencyMatcher

nlp = spacy.load("en_core_web_sm")
matcher = DependencyMatcher(nlp.vocab)

text="From monday to friday"
doc = nlp(text)
matcher.add("pattern1", [pattern])

matches = matcher(doc)

# Each token_id corresponds to one pattern dict
match_id, token_ids = matches[0]

spacy versions:

spaCy v3.0.6


en_core_web_sm >=3.0.0,<3.1.0 3.0.0 ✔


  • Your REL_OP for node2 is backwards. It should be $++.

    To give a full explanation, this code works for me.

    import spacy
    from spacy.matcher import DependencyMatcher
    nlp = spacy.load("en_core_web_sm")
    matcher = DependencyMatcher(nlp.vocab)
    text="From Monday to Friday"
    doc = nlp(text)
    pattern = [
            "RIGHT_ID": "node0",
            "RIGHT_ATTRS": {'POS': 'ADP', 'TAG': 'IN'}
            "LEFT_ID": "node0",
            "REL_OP": ">",
            "RIGHT_ID": "node1",
            "RIGHT_ATTRS": {'POS': 'PROPN'},
            "LEFT_ID": "node1",
            "REL_OP": "$++",
            "RIGHT_ID": "node2",
            "RIGHT_ATTRS": {'POS': 'ADP'},
            "LEFT_ID": "node2",
            "REL_OP": ">",
            "RIGHT_ID": "node3",
            "RIGHT_ATTRS":{'POS': 'PROPN'},
    matcher.add("pattern1", [pattern])
    matches = matcher(doc)
    # this part is just for reference
    for word in doc:
        print(word.pos_, word.tag_, word.dep_, word, sep="\t")

    Couple of points about this:

    • your second pattern is better, you shouldn't need to specify tag and pos for English (tag determines pos)
    • In the v3 small model "monday" and "friday" are not proper nouns unless capitalized (it looks like your displaCy output is from the public demo, which uses v2)