I am trying to create columns in a dataframe that show the entities and labels from a spaCy model, so far by entering the following code I can produce a column of entities:
df['new_col'] = df['Combined'].apply(lambda x: list(ner_model(x).ents))
However, if I try the same for labels:
#df['new_col1'] = df['Combined'].apply(lambda x: list(nlp(x).label_))
I get 'AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'label_''
I suspect I might have to iterate over individual tokens but I am not sure how to do this?
You need to do something like this
df['new_col1'] = df['Combined'].apply(lambda x: [ent.label_ for ent in nlp(x).ents])
The output of nlp(x)
is a Doc
object and there is no label
attribute on the Doc
object (as is explicitly stated in the error you get). You need the labels of the entities on the Doc
object which is why you need to iterate over nlp(x).ents
and get the label_
of each entity.