Search code examples
python-3.xnlpnltkspacynamed-entity-recognition

NLP Named Entity Recognition using NLTK and Spacy


I used the NER for the following sentence on both NLTK and Spacy and below are the results:

"Zoni I want to find a pencil, a eraser and a sharpener"

I ran the following code on Google Colab.

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

ex = "Zoni I want to find a pencil, a eraser and a sharpener"

def preprocess(sent):
    sent = nltk.word_tokenize(sent)
    sent = nltk.pos_tag(sent)
    return sent

sent = preprocess(ex)
sent

#Output:
[('Zoni', 'NNP'),
 ('I', 'PRP'),
 ('want', 'VBP'),
 ('to', 'TO'),
 ('find', 'VB'),
 ('a', 'DT'),
 ('pencil', 'NN'),
 (',', ','),
 ('a', 'DT'),
 ('eraser', 'NN'),
 ('and', 'CC'),
 ('a', 'DT'),
 ('sharpener', 'NN')]

But when i used spacy on the same text, it didn't return me any result

import spacy
from spacy import displacy
from collections import Counter
import en_core_web_sm
nlp = en_core_web_sm.load()

text = "Zoni I want to find a pencil, a eraser and a sharpener"

doc = nlp(text)
doc.ents

#Output:
()

Its only working for some sentences.

import spacy
from spacy import displacy
from collections import Counter
import en_core_web_sm
nlp = en_core_web_sm.load()

# text = "Zoni I want to find a pencil, a eraser and a sharpener"

text = 'European authorities fined Google a record $5.1 billion on Wednesday for abusing its power in the mobile phone market and ordered the company to alter its practices'

doc = nlp(text)
doc.ents

#Output:
(European, Google, $5.1 billion, Wednesday)

Please let me know if there is something wrong.


Solution

  • Spacy models are statistical. So the named entities that these models recognize are dependent on the data sets that these models were trained on.

    According to spacy documentation a named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title.

    For example, the name Zoni is not common, so the model doesn't recognize the name as being a named entity (person). If I change the name Zoni to William in your sentence spacy recognize William as a person.

    import spacy
    nlp = spacy.load('en_core_web_lg')
    
    doc = nlp('William I want to find a pencil, a eraser and a sharpener')
    
    for entity in doc.ents:
      print(entity.label_, ' | ', entity.text)
      #output
      PERSON  |  William
    

    One would assume that pencil, eraser and sharpener are objects, so they would potentially be classified as products, because spacy documentation states 'objects' are products. But that does not seem to be the case with the 3 objects in your sentence.

    I also noted that if no named entities are found in the input text then the output will be empty.

    import spacy
    nlp = spacy.load("en_core_web_lg")
    
    doc = nlp('Zoni I want to find a pencil, a eraser and a sharpener')
    if not doc.ents:
      print ('No named entities were recognized in the input text.')
    else:
      for entity in doc.ents:
        print(entity.label_, ' | ', entity.text)