Search code examples
pythonpython-3.xdataframenlpnamed-entity-recognition

Apply name-entity recognition on specific dataframe columns


I have the following dataframe:

df = pd.DataFrame({'source': ['Paul', 'Paul'],
                   'target': ['GOOGLE', 'Ferrari'],
                   'edge': ['works at', 'drive']
                   })

df
    source  target  edge
0   Paul    GOOGLE  works at
1   Paul    Ferrari drive

I want to apply Name-Entity Recognition(NER) on the columns.

Expected outcome:

    source  target        edge
0   PERSON  ORGANIZATION  works at
1   PERSON  CAR           drive

I tried the following function:

!python -m spacy download en_core_web_sm

import spacy
nlp = spacy.load('en_core_web_sm')

def ner(df):
    df['source_entities'] = df['source'].apply(lambda x: nlp(x).label_)
    df['target_entities'] = df['target'].apply(lambda x: nlp(x).label_)
    return df

But when I call the function ner(df) I get back an error:

AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'label_'

Any ideas on how to reach the expected outcome?


Solution

  • You are trying to get label_ attribute from list as nlp(x) return list of object. Because of which you are getting that error.

    Replace

    def ner(df):
      df['source_entities'] = df['source'].apply(lambda x: nlp(x).label_)
      df['target_entities'] = df['target'].apply(lambda x: nlp(x).label_)
      return df
    

    With

    def ner(df):
      df['source_entities'] = df['source'].apply(lambda x: [ent.label_ for ent in nlp(x).ents])
      df['target_entities'] = df['target'].apply(lambda x: [ent.label_ for ent in nlp(x).ents])
      return df