python nlp vectorization similarity spacy

How to run spaCy's sentence similarity function to an array of strings to get an array of scores?

I have to compare one spacy document to a list of spacy documents and want to get a list of similarity scores as an output. Of course, I can do this using a for loop, but I'm looking for some optimized solution like numpy offers to broadcast etc.

I have one document against a list of documents:

oneDoc = 'Hello, I want to be compared with a list of documents'
listDocs = ["I'm the first one", "I'm the second one"]

spaCy offers us a document similarity function:

oneDoc = nlp(oneDoc)
listDocs = nlp(listDocs)
similarity_score = np.zeros(len(listDocs))
for i, doc in enumerate(listDocs):
    similarity_score[i] = oneDoc.similarity(doc)

Since one document is compared with a list of two documents, the similarity score would be like this: [0.7, 0.8]

I'm looking for a way to avoid this for loop. In other words, I want to vectorize this function.

Solution

Use nlp.pipe to process all of your text documents. Grab the embeddings .vector from each document. Apply numpy pairwise distance function with cosine as metric to create matrix.