I am parsing a data using Spacy. I have to extract all the nouns and adjectives but I am getting some unusual results for some words. For example, 'use' is coming as NOUN instead of VERB, also I want 'Left' as an adjective instead of verb like for 'right' in right knee pain. Is there a way to do this?
import spacy
nlp = spacy.load('en')
doc = nlp(u'Alcohol use. Left knee pain. Right knee pain')
for word in doc:
print(word.text, word.pos_)
Output:
Alcohol NOUN
use NOUN
. PUNCT
Left VERB
knee NOUN
pain NOUN
. PUNCT
Right ADJ
knee NOUN
pain NOUN
The accuracy of POS tagging is not 100%. It is around 97% only. So we should expect these kinds of behaviours. Also, the sentences you used for testing are ambiguous even for a human being.
If you use more advanced models like en_core_web_md
or en_core_web_lg
you will get more accuracy. In your case you will get 'Left' as an adjective if you use any of these models instead of default 'en' model.