Search code examples
pythonnlpnltksimilaritywordnet

LCH Similarity - Need Same POS? Python


Comparing wordnet similarity measures to see which is most relevant/useful to my corpus - came across this error message when trying to compute lch -

"Computing the lch similarity requires Synset('home'.n.01) and Synset('chronological.a.01') to have the same part of speech."

Do I have to normalize all words in my list to be the same POS before attempting lch?

For reference, I was able to compute wup_similarity successfully without any POS normalization. The list of words I am trying to calculate similarity have all been lemmatized using WordNetLemmatizer.


Solution

  • You can't fix this by changing part of speech, since not all words can be any part of speech. "chronological" can't be a noun, for example.

    One approach is to catch exceptions like this, or to check for different parts of speech, and assign a similarity of zero.

    WordNet also handles this sometimes by simulating a shared root across different parts of speech, which is probably why some of your similarity measures worked. However, the way that works is confusing, so you probably shouldn't rely on it.

    If you want similarity for arbitrary words, try using word vectors (Word2Vec or GloVe) instead of WordNet.