I know I can use spacy entity named recognition to extract persons in a text. But I only want to extract the person or personS who are before the word "asked".
Should I use Matcher together with NER? I am new to Spacy so apologies if the question is simple
Desired Output:
Louis Ng
Current Output:
Louis Ng
Lam Pin Min
import spacy
nlp = spacy.load("en_core_web_trf")
doc = nlp (
"Mr Louis Ng asked what kind of additional support can we give to sectors and businesses where the human interaction cannot be mechanised. Mr Lam Pin Min replied that businesses could hire extra workers in such cases."
)
for ent in doc.ents:
# Print the entity text and label
print(ent.text, ent.label_)
You can use a Matcher
to find PERSON
entities that precede a specific word:
pattern = [{"ENT_TYPE": "PERSON"}, {"ORTH": "asked"}]
Because each dict corresponds to a single token, this pattern would only match the last word of the entity ("Ng"). You could let the first dict match more than one token with {"ENT_TYPE": "PERSON", "OP": "+"}
, but this runs the risk of matching two person entities in a row in an example like "Before Ms X spoke to Ms Y Ms Z asked ...".
To be able to match a single entity more easily with a Matcher
, you can add the component merge_entities
to the end of your pipeline (https://spacy.io/api/pipeline-functions#merge_entities), which merges each entity into a single token. Then this pattern would match "Louis Ng" as one token.