Search code examples
pythonnlpnltkgensimcosine-similarity

SparseTermSimilarityMatrix().inner_product() throws "cannot unpack non-iterable bool object"


While working with cosine similarity, I am facing issue calculating the inner product of two vectors.

Code:

from gensim.similarities import (
    WordEmbeddingSimilarityIndex,
    SparseTermSimilarityMatrix
)

w2v_model         = api.load("glove-wiki-gigaword-50")
similarity_index  = WordEmbeddingSimilarityIndex(w2v_model)
similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)

score = similarity_matrix.inner_product(
    X = [
        (0, 1), (1, 1), (2, 1), (3, 2), (4, 1), 
        (5, 1), (6, 1), (7, 1), (8, 1), (9, 1), 
        (10, 1), (11, 1), (12, 1), (13, 1), (14, 1), 
        (15, 1), (16, 3)
    ], 
    Y = [(221, 1), (648, 1), (8238, 1)], 
    normalized = True
)

Error:

TypeError                                 Traceback (most recent call last)
Input In [77], in <cell line: 1>()
----> 1 similarity_matrix.inner_product(
      2     [(0, 1), (1, 1), (2, 1), (3, 2), (4, 1), (5, 1), (6, 1), (7, 1), 
      3      (8, 1), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1), (14, 1), (15, 1), (16, 3)], 
      4     [(221, 1), (648, 1), (8238, 1)], normalized=True)

File ~\Anaconda3\lib\site-packages\gensim\similarities\termsim.py:558, in SparseTermSimilarityMatrix.inner_product(self, X, Y, normalized)
    555 if not X or not Y:
    556     return self.matrix.dtype.type(0.0)
--> 558 normalized_X, normalized_Y = normalized
    559 valid_normalized_values = (True, False, 'maintain')
    561 if normalized_X not in valid_normalized_values:

TypeError: cannot unpack non-iterable bool object

I am not sure why it says bool objects when both X and Y are list.


Solution

  • The normalized parameter should be a 2-tuple which declares for both X and Y separately (as in the docs).

    Therefore, the call should look like this:

    score = similarity_matrix.inner_product(
        X = [
            (0, 1), (1, 1), (2, 1), (3, 2), (4, 1), 
            (5, 1), (6, 1), (7, 1), (8, 1), (9, 1), 
            (10, 1), (11, 1), (12, 1), (13, 1), (14, 1), 
            (15, 1), (16, 3)
        ], 
        Y = [(221, 1), (648, 1), (8238, 1)], 
        normalized = (True, True)
    )