Suppose I have created a spacy model or dataset with all named entities, tagged as a PERSON, from a certain text. How can I apply it in DependencyMatcher, if I need to extract pairs "person" - "root verb"? In other words I want DependencyMatcher to use not its custom model of identifying people's names, but my, already made, dataset of names.
import spacy
from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_lg")
def on_match(matcher, doc, id, matches):
return matches
patterns = [
[#pattern1 (sur)name Jack lived
{
"RIGHT_ID": "person",
"RIGHT_ATTRS": {"ENT_TYPE": "PERSON", "DEP": "nsubj"}
},
{
"LEFT_ID": "person",
"REL_OP": "<",
"RIGHT_ID": "verb",
"RIGHT_ATTRS": {"POS": "VERB"}
}
]
matcher = DependencyMatcher(nlp.vocab)
matcher.add("PERVERB", patterns, on_match=on_match)
The DependencyMatcher does not have a "custom model of identifying people's names" - that's the NER component in the pipeline you loaded. In this case you should:
To disable a component you can just do this:
nlp = spacy.load("en_core_web_lg", disable=["ner"])
To match names from your list with an EntityRuler, see the rule-based matching docs.
Note that the above assumes you have a list of names, rather than annotations in sentences on exactly what is a name. If you have explicitly annotated names, then you can skip step 2 - disabling the NER component will be enough to leave only your existing annotations.