Search code examples
pythontensorflowkerasnlplstm

Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2


I am trying to create an image captioning model. Could you please help with this error? input1 is the image vector, input2 is the caption sequence. 32 is the caption length. I want to concatenate the image vector with the embedding of the sequence and then feed it to the decoder model.


    def define_model(vocab_size, max_length):
      input1 = Input(shape=(512,))
      input1 = tf.keras.layers.RepeatVector(32)(input1)
      print(input1.shape)

      input2 = Input(shape=(max_length,))
      e1 = Embedding(vocab_size, 512, mask_zero=True)(input2)
      print(e1.shape)

      dec1 = tf.concat([input1,e1], axis=2)
      print(dec1.shape)

      dec2 = LSTM(512)(dec1)
      dec3 = LSTM(256)(dec2)
      dec4 = Dropout(0.2)(dec3)
      dec5 = Dense(256, activation="relu")(dec4)
      output = Dense(vocab_size, activation="softmax")(dec5)
      model = tf.keras.Model(inputs=[input1, input2], outputs=output)
      model.compile(loss="categorical_crossentropy", optimizer="adam")
      print(model.summary())
      return model

ValueError: Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 512]

Solution

  • This error occurs when an LSTM layer gets input in 2D instead of 3D. For instance:

    (64, 100)
    

    The correct format is (n_samples, time_steps, features):

    (64, 5, 100)
    

    In this case, the mistake you did was that the input of dec3, which is an LSTM layer, was the output of dec2, which is also an LSTM layer. By default, the argument return_sequences in an LSTM layer is False. This means that the first LSTM returned a 2D tensor, which was incompatible with the next LSTM layer. I solved your issue by setting return_sequences=True in your first LSTM layer.

    Also, there was an error in this line:

    model = tf.keras.Model(inputs=[input1, input2], outputs=output)
    

    input1 was not an input layer because you reassigned it. See:

    input1 = Input(shape=(512,))
    input1 = tf.keras.layers.RepeatVector(32)(input1)
    

    I renamed the second one e0, consistent with how you're naming your variables.

    Now, everything is working:

    import tensorflow as tf
    from tensorflow.keras.layers import *
    from tensorflow.keras import Input
    
    vocab_size, max_length = 1000, 32
    
    input1 = Input(shape=(128))
    e0 = tf.keras.layers.RepeatVector(32)(input1)
    print(input1.shape)
    
    input2 = Input(shape=(max_length,))
    e1 = Embedding(vocab_size, 128, mask_zero=True)(input2)
    print(e1.shape)
    
    dec1 = Concatenate()([e0, e1])
    print(dec1.shape)
    
    dec2 = LSTM(16, return_sequences=True)(dec1)
    dec3 = LSTM(16)(dec2)
    dec4 = Dropout(0.2)(dec3)
    dec5 = Dense(32, activation="relu")(dec4)
    output = Dense(vocab_size, activation="softmax")(dec5)
    model = tf.keras.Model(inputs=[input1, input2], outputs=output)
    model.compile(loss="categorical_crossentropy", optimizer="adam")
    print(model.summary())
    
    Model: "model_2"
    _________________________________________________________________________________
    Layer (type)                    Output Shape         Param #     Connected to 
    =================================================================================
    input_24 (InputLayer)           [(None, 128)]        0    
    _________________________________________________________________________________
    
    input_25 (InputLayer)           [(None, 32)]         0                          
      _________________________________________________________________________________
    
    repeat_vector_12 (RepeatVector) (None, 32, 128)      0           input_24[0][0]  
    _________________________________________________________________________________
    
    embedding_11 (Embedding)        (None, 32, 128)      128000      input_25[0][0]
    _________________________________________________________________________________
    concatenate_7 (Concatenate)     (None, 32, 256)      0     repeat_vector_12[0][0]
                                                                  embedding_11[0][0]
    _________________________________________________________________________________
    lstm_12 (LSTM)                  (None, 32, 16)       17472    concatenate_7[0][0]
    _________________________________________________________________________________
    lstm_13 (LSTM)                  (None, 16)           2112        lstm_12[0][0]
    _________________________________________________________________________________
    dropout_2 (Dropout)             (None, 16)           0           lstm_13[0][0]
    _________________________________________________________________________________
    dense_4 (Dense)                 (None, 32)           544         dropout_2[0][0]
    _________________________________________________________________________________
    dense_5 (Dense)                 (None, 1000)         33000       dense_4[0][0]
    =================================================================================
    Total params: 181,128
    Trainable params: 181,128
    Non-trainable params: 0
    _________________________________________________________________________________