Search code examples
kerasword2veckeras-layerembedding

Pre-trained embedding layer: tf.constant with unsupported shape


I am going to use pre-trained word embeddings in Keras model. my matrix weights are stored in ;matrix.w2v.wv.vectors.npy; and it has shape (150854, 100).

Now when I add the embedding layer in the Keras model with different parameters as follows:

model.add(Embedding(5000, 100,
    embeddings_initializer=keras.initializers.Constant(emb_matrix),
    input_length=875, trainable=False))

I get the following error:

---------------------------------------------------------------------------
TypeError                         Traceback (most recent call last)
<ipython-input-61-8731e904e60a> in <module>()
  1 model = Sequential()
  2 
----> 3 model.add(Embedding(5000,100,
   embeddings_initializer=keras.initializers.Constant(emb_matrix),
   input_length=875,trainable=False))
  4 model.add(Conv1D(128, 10, padding='same', activation='relu'))
  5 model.add(MaxPooling1D(10))

  22 frames
 
 /usr/local/lib/python3.7/dist- 
 packages/tensorflow/python/framework/constant_op.py in 
_constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  323   raise TypeError("Eager execution of tf.constant with unsupported shape 
             "
  324                   "(value has %d elements, shape is %s with %d 
                        elements)." %
--> 325                   (num_t, shape, shape.num_elements()))
  326 
  327 

  TypeError: Eager execution of tf.constant with unsupported shape (value has 
  15085400 elements, shape is (5000, 100) with 500000 elements).

Kindly tell me where I am doing a mistake.


Solution

  • Your embeddings layer expects a vocabulary of 5,000 words and initializes an embeddings matrix of the shape 5000×100. However. the word2vec model that you are trying to load has a vocabulary of 150,854 words.

    Your either need to increase the capacity of the embedding layer or truncate the embedding matrix to allow the most frequent words only.