I have this following network (pretrained CNN + LSTM to classify videos):
frames, channels, rows, columns = 5,3,224,224
video = Input(shape=(frames,
rows,
columns,
channels))
cnn_base = VGG16(input_shape=(rows,
columns,
channels),
weights="imagenet",
include_top=True) #<=== include_top=True
cnn_base.trainable = False
cnn = Model(cnn_base.input, cnn_base.layers[-3].output, name="VGG_fm") # -3 is the 4096 layer
encoded_frames = TimeDistributed(cnn , name = "encoded_frames")(video)
encoded_sequence = LSTM(256, name = "encoded_seqeunce")(encoded_frames)
hidden_layer = Dense(1024, activation="relu" , name = "hidden_layer")(encoded_sequence)
outputs = Dense(10, activation="softmax")(hidden_layer)
model = Model(video, outputs)
That looks like this:
Now, I want to add a 1D vector of 784 features of the video to the last layer. I tried to replace the last two rows with:
encoding_input = keras.Input(shape=(784,), name="Encoding", dtype='float')
sentence_features = layers.Dense(units = 60, name = 'sentence_features')(encoding_input)
x = layers.concatenate([sentence_features, hidden_layer])
outputs = Dense(10, activation="softmax")(x)
But got the error:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("Sentence-Input-Encoding_3:0", shape=(None, 784), dtype=float32) at layer "sentence_features". The following previous layers were accessed without issue: ['encoded_frames', 'encoded_seqeunce']
Any suggestions:
your network now has two inputs... don't forget to pass both to your model
model = Model([video,encoding_input], outputs)
full example
frames, channels, rows, columns = 5,3,224,224
video = Input(shape=(frames,
rows,
columns,
channels))
cnn_base = VGG16(input_shape=(rows,
columns,
channels),
weights="imagenet",
include_top=True)
cnn_base.trainable = False
cnn = Model(cnn_base.input, cnn_base.layers[-3].output, name="VGG_fm")
encoded_frames = TimeDistributed(cnn , name = "encoded_frames")(video)
encoded_sequence = LSTM(256, name = "encoded_seqeunce")(encoded_frames)
hidden_layer = Dense(1024, activation="relu" , name = "hidden_layer")(encoded_sequence)
encoding_input = Input(shape=(784,), name="Encoding", dtype='float')
sentence_features = Dense(units = 60, name = 'sentence_features')(encoding_input)
x = concatenate([sentence_features, hidden_layer])
outputs = Dense(10, activation="softmax")(x)
model = Model([video,encoding_input], outputs) #<=== double input
model.summary()