Search code examples
python-3.xnlpnltkspacydata-extraction

Regarding Spacy's part of speech


I am parsing a data using Spacy. I have to extract all the nouns and adjectives but I am getting some unusual results for some words. For example, 'use' is coming as NOUN instead of VERB, also I want 'Left' as an adjective instead of verb like for 'right' in right knee pain. Is there a way to do this?

import spacy
nlp = spacy.load('en')
doc = nlp(u'Alcohol use. Left knee pain. Right knee pain')
for word in doc:
   print(word.text, word.pos_)

Output:
    Alcohol NOUN
   use NOUN
   . PUNCT
   Left VERB
   knee NOUN
   pain NOUN
   . PUNCT
   Right ADJ
   knee NOUN
   pain NOUN

Solution

  • The accuracy of POS tagging is not 100%. It is around 97% only. So we should expect these kinds of behaviours. Also, the sentences you used for testing are ambiguous even for a human being.

    If you use more advanced models like en_core_web_md or en_core_web_lg you will get more accuracy. In your case you will get 'Left' as an adjective if you use any of these models instead of default 'en' model.