Search code examples
pythonnlpspacysimilarityword-embedding

calculate similarity between one given word and a RANDOM list of words


I want to calculate the similarity between a given one word and a RANDOM list of words, then would rank the result in a new list, for example:

list = ['bark','black','cat','bite','human','book'] #it could be another list

is similar to the word:

word = ['dog']

--

import spacy
nlp = spacy.load('en_core_web_md')


bark = nlp("bark")
bite = nlp("bite")
human = nlp("human")
book = nlp("book")
cat = nlp("cat")
black = nlp("black")

print("dog - bark", dog.similarity(bark)) #0.4258176903285793
print("dog - bite", dog.similarity(bite)) #0.4781574605069981
print("dog - human", dog.similarity(human)) #0.35814872466230835
print("dog - book", dog.similarity(book)) #0.22838638167627964
print("dog - cat", dog.similarity(cat)) #0.8016854705531046
print("dog - black", dog.similarity(black)) #0.30601667459001575

So how I can calculate the similarity of every word in the list to the given word automatically?


Solution

  • You can do something like that:

    import spacy
    nlp = spacy.load('en_core_web_md')
    
    words = ['bark','black','cat','bite','human','book']
    word = 'dog'
    word_nlp = nlp(word)
    
    new_words = [(w, word_nlp.similarity(nlp(w))) for w in words]
    new_words.sort(key=lambda x: x[1], reverse=True)
    
    for w, value in new_words:
        print(f"{word} - {w}", value)