Hi everyone I am executing this code in Spacy to match with Regex, but I get an error:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")
doc1 = nlp("Hello hello hello, how are you?")
doc2 = nlp("Hello, how are you?")
doc3 = nlp("How are you?")
pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
matcher.add("greetings", [pattern])
for mid, start, end in matcher(doc1):
print(start, end, doc1[start:end])
The error is
pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
^
SyntaxError: invalid syntax
I am following a book called Mastering Spacy and I copy-pasted the code from the book, but I checked not to include any special characters.
Regards
A pattern added to the
Matcher
consists of a list of dictionaries.
(from docs). Your code, written more legibly:
pattern = [
{
"LOWER": {"IN": ["hello", "hi", "hallo"]},
"OP": "*",
{"IS_PUNCT": True}
}
]
The first dictionary has three entries, but the third entry is malformed: each entry to a dictionary should consist of key: value
, but you only have one item, which does not fit dictionary syntax.
Along those lines,
Each dictionary describes one token and its attributes.
Something that, lowercased, is in ["hello", "hi", "hallo"]
cannot ever be punctuation. You seem to want to match something like "Hi Hi Hello!"
, two tokens with the first of them allowing for repetition; this would be matched by something like
pattern = [
{
"LOWER": {"IN": ["hello", "hi", "hallo"]},
"OP": "*",
},
{ "IS_PUNCT": True }
]