python nlp spacy named-entity-extraction

Extract Named Entities using SpaCy and python lambda

I am using following code to extract Named Entities using lambda.

df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])

and

df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])

For a few hundred records it can extract results. But when it comes to thousands of records. It takes pretty much forever. Can someone help me to optimize this line of code?

Solution

You may improve by:

Calling nlp.pipe on the whole list of documents
Disabling unnecessary pipes.

Try:

import spacy
nlp = spacy.load("en_core_web_md", disable = ["tagger","parser"])

df = pd.DataFrame({"Text":["this is a text about Germany","this is another about Trump"]})

texts = df["Text"].to_list()
ents = []
for doc in nlp.pipe(texts):
    for ent in doc.ents:
        if ent.label_ == "GPE":
            ents.append(ent)
            
print(ents)

[Germany]