I would like to extract word which is given specific label on spaCy, natural language OSS.
On the below case, I hope to print the word English
because the label LANGUAGE
is selected.
English
There is no sample code for extracting labels on each word.
How can I fix the below error?
TypeError: Argument 'string' has incorrect type (expected str, got spacy.tokens.token.Token)
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
words = ['America', 'American', 'Christmas', 'English']
words = nlp(words)
for w in words:
if w.label_=="LANGUAGE":
print(w) #English
I've chacked each label and sample code for visualization.
Additionally, non-sentence input, ` was executable on the spaCy vizualizer on the web browser.
output
American NORP speaks English LANGUAGE and celebrates Christmas DATE in America GPE .
This code is from the sample code on the spaCy homepage. code
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
text = "American speaks English and celebrates Christmas in America."
doc = nlp(text)
displacy.serve(doc, style="ent")
You can do as follow:
import spacy
nlp = spacy.load("en_core_web_sm")
words = ['America', 'American', 'Christmas', 'English']
words = nlp('. '.join(words) + '.')
for w in words.ents:
if w.label_=="LANGUAGE":
print(w.text) #English
Keep in mind that spacy finds named entities in the text by looking at the whole context and grammar. For instance, one named entity could be "The United States of America", which are 4 words. If you wish to look word by word then you would need to give the right grammatical sense to that "text". That is why I have separated the text (your list of words) by using a period after each word.