I am extending a spaCy model using rules. While looking through the documentation, I noticed the IN
attribute, which is used to map patterns to a dictionary of properties. This is great however it only works on single tokens.
For example, this pattern: {"label":"EXAMPLE","pattern":[{"LOWER": {"IN": ["such as", "like", "for example"]}}]}
will only work with the term like
but not the others.
What is the best way to achieve the same result for multi-terms attributes?
It depends on how complicated the intended patterns are, but the PhraseMatcher
can handle similar cases as above using the attribute LOWER
:
import spacy
from spacy.matcher import PhraseMatcher
nlp = spacy.blank("en")
pmatcher = PhraseMatcher(nlp.vocab, attr="LOWER")
phrases = ["such as", "like", "for example"]
pmatcher.add("EXAMPLE", [nlp(x) for x in phrases])
assert pmatcher(nlp("Things Such As Books")) == [(15373972490796046842, 1, 3)]