Extract Named Entities using SpaCy and python lambda

I am using following code to extract Named Entities using lambda.

df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])


df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])

For a few hundred records it can extract results. But when it comes to thousands of records. It takes pretty much forever. Can someone help me to optimize this line of code?


  • You may improve by:

    1. Calling nlp.pipe on the whole list of documents
    2. Disabling unnecessary pipes.


    import spacy
    nlp = spacy.load("en_core_web_md", disable = ["tagger","parser"])
    df = pd.DataFrame({"Text":["this is a text about Germany","this is another about Trump"]})
    texts = df["Text"].to_list()
    ents = []
    for doc in nlp.pipe(texts):
        for ent in doc.ents:
            if ent.label_ == "GPE":
