The following link shows how to add multiple EntityRuler with spaCy. The code to do that is below:
import spacy
import pandas as pd
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', disable = ['ner'])
ruler = nlp.add_pipe("entity_ruler")
flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
ruler.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
for a in animals:
ruler.add_patterns([{"label": "animal", "pattern": a}])
result={}
doc = nlp("cat and artic fox, plant african daisy")
for ent in doc.ents:
result[ent.label_]=ent.text
df = pd.DataFrame([result])
print(df)
The output:
animal flower
0 artic fox african daisy
The problem is: How can i pass dataframe or table instead of the text:"cat and artic fox, plant african daisy"
Imagine that your dataframe is
df = pd.DataFrame({'Text':["cat and artic fox, plant african daisy"]})
You may define a custom method to extract the entities and then use it with Series.apply
:
def get_entities(x):
result = {}
doc = nlp(x)
for ent in doc.ents:
result[ent.label_]=ent.text
return result
and then
df['Matches'] = df['Text'].apply(get_entities)
>>> df['Matches']
0 {'animal': 'artic fox', 'flower': 'african daisy'}
Name: Matches, dtype: object