Search code examples
spacynamed-entity-recognition

Spacy NER: Extract all Persons before a specific word


I know I can use spacy entity named recognition to extract persons in a text. But I only want to extract the person or personS who are before the word "asked".

Should I use Matcher together with NER? I am new to Spacy so apologies if the question is simple

Desired Output:
Louis Ng

Current Output:
Louis Ng
Lam Pin Min


import spacy

nlp = spacy.load("en_core_web_trf")


doc = nlp (
    "Mr Louis Ng asked what kind of additional support can we give to sectors and businesses where the human interaction cannot be mechanised. Mr Lam Pin Min replied that businesses could hire extra workers in such cases."
    )

for ent in doc.ents:
    # Print the entity text and label
    print(ent.text, ent.label_)


Solution

  • You can use a Matcher to find PERSON entities that precede a specific word:

    pattern = [{"ENT_TYPE": "PERSON"}, {"ORTH": "asked"}]
    

    Because each dict corresponds to a single token, this pattern would only match the last word of the entity ("Ng"). You could let the first dict match more than one token with {"ENT_TYPE": "PERSON", "OP": "+"}, but this runs the risk of matching two person entities in a row in an example like "Before Ms X spoke to Ms Y Ms Z asked ...".

    To be able to match a single entity more easily with a Matcher, you can add the component merge_entities to the end of your pipeline (https://spacy.io/api/pipeline-functions#merge_entities), which merges each entity into a single token. Then this pattern would match "Louis Ng" as one token.