Search code examples
nlpspacynamed-entity-recognition

Attribute error creating a column of NER labels


I am trying to create columns in a dataframe that show the entities and labels from a spaCy model, so far by entering the following code I can produce a column of entities:

df['new_col'] = df['Combined'].apply(lambda x: list(ner_model(x).ents))

However, if I try the same for labels:

#df['new_col1'] = df['Combined'].apply(lambda x: list(nlp(x).label_))

I get 'AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'label_''

I suspect I might have to iterate over individual tokens but I am not sure how to do this?


Solution

  • You need to do something like this

    df['new_col1'] = df['Combined'].apply(lambda x: [ent.label_ for ent in nlp(x).ents])
    

    The output of nlp(x) is a Doc object and there is no label attribute on the Doc object (as is explicitly stated in the error you get). You need the labels of the entities on the Doc object which is why you need to iterate over nlp(x).ents and get the label_ of each entity.