I am trying to specify whether an entity is a body part. For example in "Other specified disorders of the right ear," I want to be able to identify the right ear as an entity. I tried some named entity recognition methods but they identify all entities, not just the body parts. I tried using scispacy to do so but I have not managed so far. I tried concise_concepts from spacy to create a separate entity for body parts but that didn't work either. Please guide me through how I can do that and a snippet code would be appreciated.
So this code based on this asnwer almost detects everything correctly but it needs lemmatization.
from nltk.corpus import wordnet as wn
import nltk
nltk.download('wordnet')
part = wn.synsets('body_part')[0]
def is_body_part(candidate):
for ss in wn.synsets(candidate):
# only get those where the synset matches exactly
name = ss.name().split(".", 1)[0]
if name != candidate:
continue
hit = part.lowest_common_hypernyms(ss)
if hit and hit[0] == part:
return True
return False
# true things
for word in ("finger", "hand", "head", "feet", 'foot', 'hair'):
print(is_body_part(word), word, sep="\t")
# false things
for word in ("cat", "dog", "fish", "cabbage", "knife"):
print(is_body_part(word), word, sep="\t")
Output:
True finger
True hand
True head
False feet # have to lemmatize it to foot
True foot
True hair
False cat
False dog
False fish
False cabbage
False knife