How can I make spaCy case insensitive?
Is there any code snippet that i should add or something because I couldn't get entities that are not in uppercase?
import spacy
import pandas as pd
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', disable = ['ner'])
ruler = nlp.add_pipe("entity_ruler")
flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
ruler.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
for a in animals:
ruler.add_patterns([{"label": "animal", "pattern": a}])
result={}
doc = nlp("CAT and Artic fox, plant african daisy")
for ent in doc.ents:
result[ent.label_]=ent.text
df = pd.DataFrame([result])
print(df)
As long as it's okay if LOWER
is used for all patterns, you can continue to use phrase patterns and add the phrase_matcher_attr
option for the entity ruler. Then you don't have worry about tokenizing the phrases and if you have a lot of patterns to match, it will also be faster than using token patterns:
import spacy
nlp = spacy.load('en_core_web_sm', disable=['ner'])
ruler = nlp.add_pipe("entity_ruler", config={"phrase_matcher_attr": "LOWER"})
flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
ruler.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
for a in animals:
ruler.add_patterns([{"label": "animal", "pattern": a}])
doc = nlp("CAT and Artic fox, plant african daisy")
for ent in doc.ents:
print(ent, ent.label_)
Output:
CAT animal
Artic fox animal
african daisy flower