Search code examples
pythonpandasentityspacynamed-entity-recognition

how can I pass table or dataframe instead of text with entity recognition using spacy


The following link shows how to add multiple EntityRuler with spaCy. The code to do that is below:

import spacy
import pandas as pd

from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', disable = ['ner'])
ruler = nlp.add_pipe("entity_ruler")


flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
    ruler.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
for a in animals:
    ruler.add_patterns([{"label": "animal", "pattern": a}])



result={}
doc = nlp("cat and artic fox, plant african daisy")
for ent in doc.ents:
        result[ent.label_]=ent.text
df = pd.DataFrame([result])
print(df)

The output:

      animal         flower
0  artic fox  african daisy

The problem is: How can i pass dataframe or table instead of the text:"cat and artic fox, plant african daisy"


Solution

  • Imagine that your dataframe is

    df = pd.DataFrame({'Text':["cat and artic fox, plant african daisy"]})
    

    You may define a custom method to extract the entities and then use it with Series.apply:

    def get_entities(x):
        result = {}
        doc = nlp(x)
        for ent in doc.ents:
            result[ent.label_]=ent.text
        return result
    

    and then

    df['Matches'] = df['Text'].apply(get_entities)
    >>> df['Matches']
    0    {'animal': 'artic fox', 'flower': 'african daisy'}
    Name: Matches, dtype: object