Search code examples
pythontensorflowkerasdeep-learninglstm

Input a 4 channel RGB-D Image into LSTM


I have read a sequence of images (frames) into a numpy array with shape (9135, 200, 200, 4) where 9135 is the sample size, 200 is height and width in 4 channel (R-G-B-Depth) images.

I have a sequential model with an LSTM layer:

  x_train=np.reshape(x_train,(x_train.shape[0],x_train.shape[1],x_train.shape[2],x_train.shape[3],1))
  #(9135, 200, 200, 4, 1)
  x_val=np.reshape(x_val,(x_val.shape[0],x_val.shape[1],x_val.shape[2],x_val.shape[3],1))
  #(3046, 200, 200, 4, 1)

  model = Sequential()
  model.add(TimeDistributed(Conv2D(64, (3,3), activation='relu'), input_shape=(200, 200, 4)))

  model.add(TimeDistributed(Conv2D(64, (3,3), activation='relu')))
  model.add(TimeDistributed(GlobalAveragePooling2D()))
  model.add(LSTM(1024, activation='relu', return_sequences=False))
  model.add(Dense(1024, activation='relu'))
  model.add(Dropout(.5))
  model.add(Dense(10, activation='sigmoid'))
  model.compile('adam', loss='categorical_crossentropy')
  model.summary()

  history = model.fit(x_train, y_train, epochs=epochs,batch_size=batch_size,verbose=verbose, validation_data=(x_val, y_val))

but there is an error in the result:

ValueError: Input 0 of layer conv2d is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: [None, 200, 4]

What is the suggested way to input a 4 channel image into an LSTM layer in Keras?

PS: Εach class has different frames so I do not know how to put unstable timestep


Solution

  • You need to reshape

      x_train=np.reshape(x_train,(x_train.shape[0],1,x_train.shape[1],x_train.shape[2],x_train.shape[3]))
      #(9135,1 200, 200,4)
      x_val=np.reshape(x_val,(x_val.shape[0],1,x_val.shape[1],x_val.shape[2],x_val.shape[3]))
      #(3046,1 200, 200,4)
    

    and change the input_shape of model to input_shape=(None,200, 200, 4)))