I have the following dataframe:
df = pd.DataFrame({'source': ['Paul', 'Paul'],
'target': ['GOOGLE', 'Ferrari'],
'edge': ['works at', 'drive']
})
df
source target edge
0 Paul GOOGLE works at
1 Paul Ferrari drive
I want to apply Name-Entity Recognition(NER)
on the columns.
Expected outcome:
source target edge
0 PERSON ORGANIZATION works at
1 PERSON CAR drive
I tried the following function:
!python -m spacy download en_core_web_sm
import spacy
nlp = spacy.load('en_core_web_sm')
def ner(df):
df['source_entities'] = df['source'].apply(lambda x: nlp(x).label_)
df['target_entities'] = df['target'].apply(lambda x: nlp(x).label_)
return df
But when I call the function ner(df)
I get back an error:
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'label_'
Any ideas on how to reach the expected outcome?
You are trying to get label_ attribute from list as nlp(x) return list of object. Because of which you are getting that error.
Replace
def ner(df):
df['source_entities'] = df['source'].apply(lambda x: nlp(x).label_)
df['target_entities'] = df['target'].apply(lambda x: nlp(x).label_)
return df
With
def ner(df):
df['source_entities'] = df['source'].apply(lambda x: [ent.label_ for ent in nlp(x).ents])
df['target_entities'] = df['target'].apply(lambda x: [ent.label_ for ent in nlp(x).ents])
return df