I was reproducing a Spacy rule-matching example:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")
doc = nlp("Good morning, I'm here. I'll say good evening!!")
pattern = [{"LOWER": "good"},{"LOWER": {"IN": ["morning", "evening"]}},{"IS_PUNCT": True}]
matcher.add("greetings", [pattern]) # good morning/evening with one pattern with the help of IN as follows
matches = matcher(doc)
for mid, start, end in matches:
print(start, end, doc[start:end])
which is supposed to match
Good morning good evening!
But the above code also matches "I" in both occasions
0 3 Good morning,
3 4 I
7 8 I
10 13 good evening!
I just want to remove the "I" from the Matching
Thank you
When I run your code on my machine (Windows 11 64-bit
, Python 3.10.9
, spaCy 3.4.4
with both the en_core_web_sm
and en_core_web_trf
pipelines), it produces a NameError
because matcher
is not defined. After defining matcher
as an instantiation of the Matcher
class in accordance with the spaCy Matcher documentation, I get the following (desired) output with both pipelines:
0 3 Good morning,
10 13 good evening!
The full working code is shown below. I'd suggest restarting your IDE and/or computer if you're still seeing your unexpected results.
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
doc = nlp("Good morning, I'm here. I'll say good evening!!")
matcher = Matcher(nlp.vocab)
pattern = [{"LOWER": "good"}, {"LOWER": {"IN": ["morning", "evening"]}}, {"IS_PUNCT": True}]
matcher.add("greetings", [pattern]) # good morning/evening with one pattern with the help of IN as follows
matches = matcher(doc)
for match_id, start, end in matches:
print(start, end, doc[start:end])