Search code examples
pythonreplaceentityspacy

Python Spacy replace value of ent.label_ == PERSON with something else


I am using Python Spacy to replace any entity with the label_ == "PERSON" with "[XXX]". It seems like I have done that correctly, but I am struggling with replacing it in my Teststring:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")

file_text = """This is my teststring. Isaac Newton is supposed to be changed."""

nlp.add_pipe("merge_entities")

def change_names(file_text):
    text_doc = nlp(file_text)
    mylist = []
    for ent in text_doc.ents:
        if ent.label_ == "PERSON":
            print(ent)
            mylist.append("[XXX]")
        else:
            mylist.append(ent.text)
    res = ''.join(mylist)
    print(res)
    print(text_doc)

change_names(file_text)

This results in:

Isaac Newton [XXX] This is my teststring. Isaac Newton is supposed to be changed.

Result should be: This is my teststring. [XXX] is supposed to be changed

Now I want to iterate over my text_doc and replace any ent with label_ == "PERSON" to "[XXX]". This is not working out for me. I tried using a double forloop for iterating over the string and if an item is an entity, jump into the for loop I posted here. Any suggestions?


Solution

  • Since all you need is a string output, you can use

    result = []
    for t in text_doc:
        if t.ent_type_ == "PERSON":
            result.append("[XXX]")
        else:
            result.append(t.text)
        result.append(t.whitespace_)
    
    res = ''.join(result)
    print(res)
    

    That is:

    • Once the PERSON entity is found, append [XXX] to the result list
    • Else, add the current token text
    • Append any whitespace after the token if present.

    Then, in the end, join the result items.