I am using following code to extract Named Entities using lambda.
df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])
and
df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])
For a few hundred records it can extract results. But when it comes to thousands of records. It takes pretty much forever. Can someone help me to optimize this line of code?
You may improve by:
nlp.pipe
on the whole list of documentsTry:
import spacy
nlp = spacy.load("en_core_web_md", disable = ["tagger","parser"])
df = pd.DataFrame({"Text":["this is a text about Germany","this is another about Trump"]})
texts = df["Text"].to_list()
ents = []
for doc in nlp.pipe(texts):
for ent in doc.ents:
if ent.label_ == "GPE":
ents.append(ent)
print(ents)
[Germany]