Search code examples
pythonkerasdeep-learningkeras-layer

How to add additional data to CNN+LSTM network


I have this following network (pretrained CNN + LSTM to classify videos):

 frames, channels, rows, columns = 5,3,224,224

  video = Input(shape=(frames,
                      rows,
                      columns,
                      channels))
  
  cnn_base = VGG16(input_shape=(rows,
                                columns,
                                channels),
                  weights="imagenet",
                  include_top=True) #<=== include_top=True
  cnn_base.trainable = False

  cnn = Model(cnn_base.input, cnn_base.layers[-3].output, name="VGG_fm") # -3 is the 4096 layer
  encoded_frames = TimeDistributed(cnn , name = "encoded_frames")(video)
  encoded_sequence = LSTM(256, name = "encoded_seqeunce")(encoded_frames)
  hidden_layer = Dense(1024, activation="relu" , name = "hidden_layer")(encoded_sequence)
  outputs = Dense(10, activation="softmax")(hidden_layer)

  model = Model(video, outputs)

That looks like this:

enter image description here

Now, I want to add a 1D vector of 784 features of the video to the last layer. I tried to replace the last two rows with:

  encoding_input = keras.Input(shape=(784,), name="Encoding", dtype='float') 
  sentence_features = layers.Dense(units = 60, name = 'sentence_features')(encoding_input)
  x = layers.concatenate([sentence_features, hidden_layer])
  outputs = Dense(10, activation="softmax")(x)

But got the error:

ValueError: Graph disconnected: cannot obtain value for tensor Tensor("Sentence-Input-Encoding_3:0", shape=(None, 784), dtype=float32) at layer "sentence_features". The following previous layers were accessed without issue: ['encoded_frames', 'encoded_seqeunce']

Any suggestions:


Solution

  • your network now has two inputs... don't forget to pass both to your model

    model = Model([video,encoding_input], outputs)
    

    full example

    frames, channels, rows, columns = 5,3,224,224
    
    video = Input(shape=(frames,
                      rows,
                      columns,
                      channels))
    
    cnn_base = VGG16(input_shape=(rows,
                                columns,
                                channels),
                  weights="imagenet",
                  include_top=True)
    cnn_base.trainable = False
    
    cnn = Model(cnn_base.input, cnn_base.layers[-3].output, name="VGG_fm")
    encoded_frames = TimeDistributed(cnn , name = "encoded_frames")(video)
    encoded_sequence = LSTM(256, name = "encoded_seqeunce")(encoded_frames)
    hidden_layer = Dense(1024, activation="relu" , name = "hidden_layer")(encoded_sequence)
    
    encoding_input = Input(shape=(784,), name="Encoding", dtype='float') 
    sentence_features = Dense(units = 60, name = 'sentence_features')(encoding_input)
    x = concatenate([sentence_features, hidden_layer])
    outputs = Dense(10, activation="softmax")(x)
    
    model = Model([video,encoding_input], outputs) #<=== double input
    model.summary()