Search code examples
pythonpandasnlpspacy-3

How to get entities from a text and match them to the id of the source file?


I have a csv file with some columns including an id column and a text column.

Example source file: source_file

I like to extract the entity text and label by using spaCy. Then write the entity text and label to a dataframe with the corresponding source id. It is very well possible that a sentence contains more then one entity. Those entities should have the same id.

desired_output

I thought that using the pd apply function is the best option to do this, but I get an error. Can anybody tell me what I am doing wrong

df = pd.read_csv(r'data/test_data.csv')
nlp = spacy.load("nl_core_news_lg")
ner_entities = []

def get_entities(row):
    entity_id = row['id']
    text = row['text']
    doc = nlp(Text)
    for ent in doc.ents:
        ner_entities.append([entity_id, ent.text, ent.label_])

df.apply(lambda row: get_entities(row))
ner_df = pd.DataFrame(ner_entities, columns=['id', 'ent', 'label'])
merged_df = pd.merge(df, ner_df, on='id', how='outer')enter code here

I get following error message:

error message


Solution

  • Just from the comment:

    You need to set axis=1 when you want to apply a function to rows. So df.apply(lambda row: get_entities(row), axis=1). axis is set to 0 by default otherwise.