Search code examples
pythontensorflowword-embeddingelmo

Calculate cosine similarity for elmo model


I am trying to calculate the cosine similarity of wordsim set using the Elmo model. This may not make sense since it is designed for sentence word embedding, but I want to see how the model performs in the situations like these. The Elmo I am using is from:

https://tfhub.dev/google/elmo/3

If I run the following code (it is modified from the documentation page to comply with TF 2.0), it will generate the tensor representation of the word.

import tensorflow_hub as hub
import tensorflow as tf


elmo = hub.load("https://tfhub.dev/google/elmo/3")
tensor_of_strings = tf.constant(["Gray",
                                 "Quick",
                                 "Lazy"])
elmo.signatures['default'](tensor_of_strings)

If I try to calculate cosine similarity directly I will get the error, NotImplementedError: Cannot convert a symbolic Tensor (strided_slice_59:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported . I am not sure how to convert the Tensor to the numpy array directly, or is there a better evaluator for tensors instead of cosine similarity?

Edit: This is what I did for calculate cosine similarity

def cos_sim(a, b):
    return np.inner(a, b) / (np.linalg.norm(a) * (np.linalg.norm(b)))

print("ELMo:", cos_sim(elmo.signatures['default'](tensor_of_strings)['word_emb'][0], elmo.signatures['default'](tensor_of_strings)['word_emb'][1]))

Solution

  • In this thread here: NotImplementedError: Cannot convert a symbolic Tensor (lstm_2/strided_slice:0) to a numpy array. T. The solution is to change the numpy version (1.19.5 perhaps would be a suitable version).

    I think it would be important to provide all the versions of (Python + TensorFlow + NumPy).

    Also, like @Edwin Cheong mentioned in the comment it is likely you mingled numpy and Tensorflow code in the loss function. It would be also important to provide us that information, here the issue was the loss function computation/creation: NotImplementedError: Cannot convert a symbolic Tensor (2nd_target:0) to a numpy array.