Search code examples
machine-learningnlpmatchingsimilarity

Recommended algorithms for word similarity


I'm researching viable algorithms/solutions to implement and solve following problem: match users based on their common interests

Example:
U1: skiing, asian culture, meditation, java, crypto
U2: yoga, meditation, management, travel tips USA
U3: programming, travelling, oriental cuisine

I'm considering three dimensions based on word similarity:

  • Dictionary synonyms
  • Close semantic similarity (programming > java, travelling > travel tips USA)
  • Loose semantic similarity (asian culture >> oriental cuisine, programming >> crypto, asian culture >> yoga, yoga >> meditation)

Based on these approaches I would like to calculate a relevancy score and match users accordingly.

Thanks for the input!


Solution

  • Levenshtein distance was not very useful for capturing semantic similarity in my experiments.

    Wordnet worked well but was slow for large set of words

    Word2Vec is good approximation for wordnet but not as comprehensive in capturing all the related words

    Also suggest you look at the graph embedding algorithm used in Starspace from Facebook and specially the use case around Facebook page likes and recommendations