Search code examples
pythonpython-3.xstringnlpspacy

How to use implemented labels on spaCy for each word?


What I would like to do

I would like to extract word which is given specific label on spaCy, natural language OSS.

specific labels on spaCy

On the below case, I hope to print the word English because the label LANGUAGE is selected.

English

Problem

There is no sample code for extracting labels on each word.

How can I fix the below error?

TypeError: Argument 'string' has incorrect type (expected str, got spacy.tokens.token.Token)

Current Code

import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")

words = ['America', 'American', 'Christmas', 'English']
words = nlp(words)
for w in words:
    if w.label_=="LANGUAGE":
        print(w) #English

What I tried

I've chacked each label and sample code for visualization.

Additionally, non-sentence input, ` was executable on the spaCy vizualizer on the web browser.

output

American NORP speaks English LANGUAGE and celebrates Christmas DATE in America GPE .

This code is from the sample code on the spaCy homepage. code

import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")

text = "American speaks English and celebrates Christmas in America."
doc = nlp(text)
displacy.serve(doc, style="ent")

Solution

  • You can do as follow:

    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    words = ['America', 'American', 'Christmas', 'English']
    words = nlp('. '.join(words) + '.')
    for w in words.ents:
        if w.label_=="LANGUAGE":
            print(w.text) #English
    

    Keep in mind that spacy finds named entities in the text by looking at the whole context and grammar. For instance, one named entity could be "The United States of America", which are 4 words. If you wish to look word by word then you would need to give the right grammatical sense to that "text". That is why I have separated the text (your list of words) by using a period after each word.