Search code examples
pythonpandasfor-loopspacynamed-entity-recognition

How to append Named Entites extracted from DataFrame?


the following code to extract and then print entities from df['Article'] is working just fine.

for i in df['Article'].to_list():
    doc = nlp(i)
    for entity in doc.ents:
        print((entity.text))

But whenever I try to append these entities using entities_list.append((entity.text)) I get TypeError: object of type 'float' has no len() error I have tried to create entities_list=[] using following way

entities_list = []
for i in df['Article'].to_list():
    doc = nlp(i)
    for entity in doc.ents:
        print((entity.text))

As well as

for i in df['Article'].to_list():
    entities_list = []
    doc = nlp(i)
    for entity in doc.ents:
        print((entity.text))

Also even if I try to create another DataFrame or add new column to df I get same error. Can someone help with what am I doing wrong here? Thank you

EDIT:
data in df['Articles'] is news text like

Pence’s move comes as inoculation efforts are unfurling around the world in the race to halt a pandemic that has claimed at least 1.66 million lives and infected more than 74 million people.

very first code prints entities extracted from text but I need those entities to append in list like as following

[entity1, entity2, entity3, entity4]

Solution

  • It seems that the column Article has some missing values, do the following:

    entities_list = []
    for i in df['Article'].fillna('').to_list():
        doc = nlp(i)
        for entity in doc.ents:
            entities_list.append((entity.text))