I have a csv file with some columns including an id column and a text column.
Example source file: source_file
I like to extract the entity text and label by using spaCy. Then write the entity text and label to a dataframe with the corresponding source id. It is very well possible that a sentence contains more then one entity. Those entities should have the same id.
I thought that using the pd apply function is the best option to do this, but I get an error. Can anybody tell me what I am doing wrong
df = pd.read_csv(r'data/test_data.csv')
nlp = spacy.load("nl_core_news_lg")
ner_entities = []
def get_entities(row):
entity_id = row['id']
text = row['text']
doc = nlp(Text)
for ent in doc.ents:
ner_entities.append([entity_id, ent.text, ent.label_])
df.apply(lambda row: get_entities(row))
ner_df = pd.DataFrame(ner_entities, columns=['id', 'ent', 'label'])
merged_df = pd.merge(df, ner_df, on='id', how='outer')enter code here
I get following error message:
Just from the comment:
You need to set axis=1
when you want to apply a function to rows. So df.apply(lambda row: get_entities(row), axis=1)
. axis
is set to 0 by default otherwise.