Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2

I am trying to create an image captioning model. Could you please help with this error? input1 is the image vector, input2 is the caption sequence. 32 is the caption length. I want to concatenate the image vector with the embedding of the sequence and then feed it to the decoder model.


    def define_model(vocab_size, max_length):
      input1 = Input(shape=(512,))
      input1 = tf.keras.layers.RepeatVector(32)(input1)
      print(input1.shape)

      input2 = Input(shape=(max_length,))
      e1 = Embedding(vocab_size, 512, mask_zero=True)(input2)
      print(e1.shape)

      dec1 = tf.concat([input1,e1], axis=2)
      print(dec1.shape)

      dec2 = LSTM(512)(dec1)
      dec3 = LSTM(256)(dec2)
      dec4 = Dropout(0.2)(dec3)
      dec5 = Dense(256, activation="relu")(dec4)
      output = Dense(vocab_size, activation="softmax")(dec5)
      model = tf.keras.Model(inputs=[input1, input2], outputs=output)
      model.compile(loss="categorical_crossentropy", optimizer="adam")
      print(model.summary())
      return model

ValueError: Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 512]

Solution

This error occurs when an LSTM layer gets input in 2D instead of 3D. For instance:

(64, 100)

The correct format is (n_samples, time_steps, features):

(64, 5, 100)

In this case, the mistake you did was that the input of dec3, which is an LSTM layer, was the output of dec2, which is also an LSTM layer. By default, the argument return_sequences in an LSTM layer is False. This means that the first LSTM returned a 2D tensor, which was incompatible with the next LSTM layer. I solved your issue by setting return_sequences=True in your first LSTM layer.

Also, there was an error in this line:

model = tf.keras.Model(inputs=[input1, input2], outputs=output)

input1 was not an input layer because you reassigned it. See:

input1 = Input(shape=(512,))
input1 = tf.keras.layers.RepeatVector(32)(input1)

I renamed the second one e0, consistent with how you're naming your variables.

Now, everything is working:

import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras import Input

vocab_size, max_length = 1000, 32

input1 = Input(shape=(128))
e0 = tf.keras.layers.RepeatVector(32)(input1)
print(input1.shape)

input2 = Input(shape=(max_length,))
e1 = Embedding(vocab_size, 128, mask_zero=True)(input2)
print(e1.shape)

dec1 = Concatenate()([e0, e1])
print(dec1.shape)

dec2 = LSTM(16, return_sequences=True)(dec1)
dec3 = LSTM(16)(dec2)
dec4 = Dropout(0.2)(dec3)
dec5 = Dense(32, activation="relu")(dec4)
output = Dense(vocab_size, activation="softmax")(dec5)
model = tf.keras.Model(inputs=[input1, input2], outputs=output)
model.compile(loss="categorical_crossentropy", optimizer="adam")
print(model.summary())

Model: "model_2"
_________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to 
=================================================================================
input_24 (InputLayer)           [(None, 128)]        0    
_________________________________________________________________________________

input_25 (InputLayer)           [(None, 32)]         0                          
  _________________________________________________________________________________

repeat_vector_12 (RepeatVector) (None, 32, 128)      0           input_24[0][0]  
_________________________________________________________________________________

embedding_11 (Embedding)        (None, 32, 128)      128000      input_25[0][0]
_________________________________________________________________________________
concatenate_7 (Concatenate)     (None, 32, 256)      0     repeat_vector_12[0][0]
                                                              embedding_11[0][0]
_________________________________________________________________________________
lstm_12 (LSTM)                  (None, 32, 16)       17472    concatenate_7[0][0]
_________________________________________________________________________________
lstm_13 (LSTM)                  (None, 16)           2112        lstm_12[0][0]
_________________________________________________________________________________
dropout_2 (Dropout)             (None, 16)           0           lstm_13[0][0]
_________________________________________________________________________________
dense_4 (Dense)                 (None, 32)           544         dropout_2[0][0]
_________________________________________________________________________________
dense_5 (Dense)                 (None, 1000)         33000       dense_4[0][0]
=================================================================================
Total params: 181,128
Trainable params: 181,128
Non-trainable params: 0
_________________________________________________________________________________