Is there anyway by SpaCy to replace entity detected by SpaCy NER with its label? For example: I am eating an apple while playing with my Apple Macbook.
I have trained NER model with SpaCy to detect "FRUITS" entity and the model successfully detects the first "apple" as "FRUITS", but not the second "Apple".
I want to do post-processing of my data by replacing each entity with its label, so I want to replace the first "apple" with "FRUITS". The sentence will be "I am eating an FRUITS while playing with my Apple Macbook."
If I simply use regex, it will replace the second "Apple" with "FRUITS" as well, which is incorrect. Is there any smart way to do this?
Thanks!
the entity label is an attribute of the token (see here)
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')
s = "His friend Nicolas is here."
doc = nlp(s)
print([t.text if not t.ent_type_ else t.ent_type_ for t in doc])
# ['His', 'friend', 'PERSON', 'is', 'here', '.']
print(" ".join([t.text if not t.ent_type_ else t.ent_type_ for t in doc]) )
# His friend PERSON is here .
Edit:
In order to handle cases were entities can span several words the following code can be used instead:
s = "His friend Nicolas J. Smith is here with Bart Simpon and Fred."
doc = nlp(s)
newString = s
for e in reversed(doc.ents): #reversed to not modify the offsets of other entities when substituting
start = e.start_char
end = start + len(e.text)
newString = newString[:start] + e.label_ + newString[end:]
print(newString)
#His friend PERSON is here with PERSON and PERSON.
Update:
Jinhua Wang brought to my attention that there is now a more built-in and simpler way to do this using the merge_entities pipe. See Jinhua's answer below.