Search code examples
pythonspacysemanticssimilarity

How to find semantic similarity between a list of words?


Input:

listToStr = 'degeneration agents alpha alternative amd analysis angiogenesis anti anti vegf appears associated based best bevacizumab blindness blood'

Code I am using:

simi = []
tokens = nlp(listToStr) 
length = len(tokens)

for i in range(length):
    #print(i)
    sim = tokens[i].similarity(tokens[i+1])
    simi.append(sim)
print(simi)

Error:

[E040] Attempt to access token at 17, max length 17.

How can I remove this error?

I am using spacy. Here's the link to it: https://www.geeksforgeeks.org/python-word-similarity-using-spacy/#:~:text=Python%20%7C%20Word%20Similarity%20using%20spaCy,simple%20method%20for%20this%20task.


Solution

  • Inside the for loop, an index that is out of range for the list of tokens is created as a consequence of the tokens[i + 1] operation. You could do something like this instead:

    import spacy
    
    nlp = spacy.load("en_core_web_sm")
    
    listToStr = 'degeneration agents alpha alternative amd analysis angiogenesis anti anti vegf appears associated based best bevacizumab blindness blood'
    
    simi = []
    tokens = nlp(listToStr) 
    
    for idx, tok in enumerate(tokens):
        sim = []
        for nextok in tokens[idx:]:
            sim.append(tok.similarity(nextok))
        simi.append(sim)
    

    This test the similarity of each word with the next words in the sentence, so the result is a list of lists.