Search code examples
pythonsimilarityword-embeddingtopic-modeling

How to evaluate word embeddings quality using AvgSimC and MaxSimC


I am working in a project of topical word embeddings, where I need to evaluate the quality of word embedidngs based on multi-sense of a word. I have seen in some research papers using AvgSimC and MaxSimC. As per my understanding, sense of a word predict by considering context words using these two methods. Unfortunately I didn't get the clear implementation concepts and source code for these tow methods.

Source code (python or c) of implementation AvgSimC and MaxSimC using SCWS data set and any kinds of documentation/tutorial or any references will be more appreciated.

Thank you for your valuable time.


Solution

  • For two word vectors word1 and word2 in python

       def AvgSimC(word1, word2):
           cosine_similarity = 1 - spatial.distance.cosine(word1, word1)
           return np.mean(cosine_similarity)
    
       def MaxSimC(word1, word2):
           cosine_similarity = 1 - spatial.distance.cosine(word1, word1)
           return np.max(cosine_similarity)