Search code examples
pythontensorflowkerasstanford-nlpword-embedding

Tensorflow 2 Glove could not broadcast input array Can't prepare the embedding matrix but not +1


I get a ValueError: could not broadcast input array from shape (50) into shape (100) preparing embedding matrix I have loaded glove and made the word to vec Found 400000 word vectors.

I did look at a bunch of similar questions but they all seem to deal with forgetting to add the +1 in the max number words, I think I have that covered but still have the issue. Any help deeply appreciated.

num_words = min(MAX_NUM_WORDS, len(word2idx_inputs) + 1)

I also tried

num_words = min(MAX_NUM_WORDS, len(word2idx_inputs)) + 1

Using pre-trained word embeddings in a keras model?

I also tried this one as well

Keras word embeddings Glove: can't prepare the embedding matrix

but also was the +1 issue

FYI: Extreme newbie at this 1st time doing Seq to seq to due to the translating Tagalog into English

The Error that is received


Filling pre-trained embeddings...

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-acf0d8a4c4ca> in <module>
     8     if embedding_vector is not None:
     9       # words not found in embedding index will be all zeros.
---> 10       embedding_matrix[i] = embedding_vector
    11 
    12 # create embedding layer

ValueError: could not broadcast input array from shape (50) into shape (100)

Code


# prepare embedding matrix
print('Filling pre-trained embeddings...')
num_words = min(MAX_NUM_WORDS, len(word2idx_inputs) + 1)
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
for word, i in word2idx_inputs.items():
 if i < MAX_NUM_WORDS:
   embedding_vector = word2vec.get(word)
   if embedding_vector is not None:
     # words not found in embedding index will be all zeros.
     embedding_matrix[i] = embedding_vector

# create embedding layer
embedding_layer = Embedding(
 num_words,
 EMBEDDING_DIM,
 weights=[embedding_matrix],
 input_length=max_len_input,
 # trainable=True
)

# create targets, since we cannot use sparse
# categorical cross entropy when we have sequences
decoder_targets_one_hot = np.zeros(
 (
   len(input_texts),
   max_len_target,
   num_words_output
 ),
 dtype='float32'
)

# assign the values
for i, d in enumerate(decoder_targets):
 for t, word in enumerate(d):
   if word != 0:
     decoder_targets_one_hot[i, t, word] = 1



Solution

  • check the EMBEDDING_DIM value ,probably pre-trained data have less limit, as error shows shape(50) into shape(100). So set EMBEDDING_DIM =50.