I am using spacy
with Matcher
to detect some words. When I want to find a word with a single punctuation like -
works:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
#
pattern = [{"LOWER": "nice"}, {"IS_PUNCT": True}, {"LOWER": "word"}]
matcher.add("nice-word", [pattern])
doc = nlp("This is a nice-word also? Why is this a nice word")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(match_id, string_id, start, end, span.text)
Output:
1899655961849619838 nice-word 3 6 nice-word
This works great! But imagine we have a word with double -
, I can't get it work. I would like to find a word for example: nice-word-also
. Here is some reproducible code:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
#
pattern = [{"LOWER": "nice"}, {"IS_PUNCT": True}, {"LOWER": "word"}, {"LOWER": "also"}]
matcher.add("nice-word-also", [pattern])
doc = nlp("This is a nice-word-also? Why is this a nice word")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(match_id, string_id, start, end, span.text)
This doesn't return anything. So I was wondering if anyone knows how to use Spacy matches to detect words with double punctuation like the example above?
You are missing one {"IS_PUNCT": True}
in your pattern:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
#
pattern = [{"LOWER": "nice"}, {"IS_PUNCT": True}, {"LOWER": "word"}, {"IS_PUNCT": True}, {"LOWER": "also"}]
matcher.add("nice-word-also", [pattern])
doc = nlp("This is a nice-word-also? Why is this a nice word")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(match_id, string_id, start, end, span.text)
#output
9732713127922352434 nice-word-also 3 8 nice-word-also