Search code examples
pythonnlpspacynamed-entity-recognition

How to identify body part names in a text with python


I am trying to specify whether an entity is a body part. For example in "Other specified disorders of the right ear," I want to be able to identify the right ear as an entity. I tried some named entity recognition methods but they identify all entities, not just the body parts. I tried using scispacy to do so but I have not managed so far. I tried concise_concepts from spacy to create a separate entity for body parts but that didn't work either. Please guide me through how I can do that and a snippet code would be appreciated.


Solution

  • So this code based on this asnwer almost detects everything correctly but it needs lemmatization.

    from nltk.corpus import wordnet as wn
    import nltk 
    nltk.download('wordnet')
    part = wn.synsets('body_part')[0]
    
    def is_body_part(candidate):
        for ss in wn.synsets(candidate):
            # only get those where the synset matches exactly
            name = ss.name().split(".", 1)[0]
            if name != candidate:
                continue
            hit = part.lowest_common_hypernyms(ss)
            if hit and hit[0] == part:
                return True
        return False
    
    # true things
    for word in ("finger", "hand", "head", "feet", 'foot', 'hair'):
        print(is_body_part(word), word, sep="\t")
    
    # false things
    for word in ("cat", "dog", "fish", "cabbage", "knife"):
        print(is_body_part(word), word, sep="\t")
    

    Output:

    True    finger
    True    hand
    True    head
    False   feet # have to lemmatize it to foot
    True    foot
    True    hair
    False   cat
    False   dog
    False   fish
    False   cabbage
    False   knife