Search code examples
cpu-wordsimilarity

Sentence similarity - How to calculate the depth of subsumer using WordNet?


I try to build a tool to calculate the similarity between 2 words and I found that there is a formula come from Manchester Metropolitan University as following:

Formula for word similarity from Manchester research group

Until now, I am still confused how to get the h which is the depth of subsumer in the hierarchical semantic nets. As my understanding, h is the path length from the top word to the a certain word, as reference from the author, the top word is 'entity' for NOUN. But how about another kind of word such as ADJ, ADV, VERB...? And if we already have the top word, how can we list out the path from it to the word we need to calculate

The paper is at the following link: https://www.researchgate.net/profile/Keeley_Crockett/publication/232645326_Sentence_Similarity_Based_on_Semantic_Nets_and_Corpus_Statistics/links/0deec51b8db68f19fa000000.pdf

Really appreciate for any answer. Thanks


Solution

  • I would like to add more detail which I have just found. These details are enough for my searching but may not exactly with the question above, but I think I need to share to somebody need it in future.

    1. 'Entity' is not only root of Noun, but also the root of any word even it is VERB, ADJ, ADV....

      • Ex full path for the word 'kiss': ROOT#n#1 < entity#n#1 < abstraction#n#6 < psychological_feature#n#1 < event#n#1 < act#n#2 < touch#n#5 < kiss#n#1
      • EX full path for the word 'kick': ROOT#n#1 < entity#n#1 < abstraction#n#6 < psychological_feature#n#1 < event#n#1 < act#n#2 < speech_act#n#1 < objection#n#2 < kick#n#4
    2. To calculate the depth of any word, we need to calculate from the beginning word ('entity') and base on the Word Net hierarchical database.

    Come back to above example, the h (length of subsummer of 'kiss' and 'kick') is 6, which is count from the top tree node root to the word 'act'