Search code examples
pythonnlpspacynamed-entity-recognition

Python named entity recognition (NER): Replace named entities with labels


I'm new to Python NER and am trying to replace named entities in text input with their labels.

from nerd import ner
input_text = """Stack Overflow is a question and answer site for professional and enthusiast programmers. It is a privately held website, the flagship site of the Stack Exchange Network,[5][6][7] created in 2008 by Jeff Atwood and Joel Spolsky."""
doc = ner.name(input_text, language='en_core_web_sm')
text_label = [(X.text, X.label_) for X in doc]
print(text_label)

The output is: [('2008', 'DATE'), ('Jeff Atwood', 'PERSON'), ('Joel Spolsky', 'PERSON')]

I can then extract the people, for example:

people = [i for i,label in text_label if 'PERSON' in label] 
print(people)

to get ['Jeff Atwood', 'Joel Spolsky'].

My question is how can I replace identified named entities in the original input text so that the result is:

Stack Overflow is a question and answer site for professional and enthusiast programmers. It is a privately held website, the flagship site of the Stack Exchange Network,[5][6][7] created in DATE by PERSON and PERSON.

Thanks so much!


Solution

  • You can loop over text_label and replace each text with the corresponding label

    for text, label in text_label:
        input_text = input_text.replace(text, label)
    
    print(input_text)