Search code examples
pythontensorflowkerasseq2seqencoder-decoder

Apply an Encoder-Decoder (Seq2Seq) inference model with Attention


Hello a StackOverflow community!

I'm trying to create an inference model for a seq2seq (Encoded-Decoded) model with Attention. It's a definition of the inference model.

model = compile_model(tf.keras.models.load_model(constant.MODEL_PATH, compile=False))

encoder_input = model.input[0]
encoder_output, encoder_h, encoder_c = model.layers[1].output
encoder_state = [encoder_h, encoder_c]
encoder_model = tf.keras.Model(encoder_input, encoder_state)

decoder_input = model.input[1]
decoder = model.layers[3]
decoder_new_h = tf.keras.Input(shape=(n_units,), name='input_3')
decoder_new_c = tf.keras.Input(shape=(n_units,), name='input_4')
decoder_input_initial_state = [decoder_new_h, decoder_new_c]

decoder_output, decoder_h, decoder_c = decoder(decoder_input, initial_state=decoder_input_initial_state)
decoder_output_state = [decoder_h, decoder_c]

# These lines cause an error
context = model.layers[4]([encoder_output, decoder_output])
decoder_combined_context = model.layers[5]([context, decoder_output])
output = model.layers[6](decoder_combined_context)
output = model.layers[7](output)
# end

decoder_model = tf.keras.Model([decoder_input] + decoder_input_initial_state, [output] + decoder_output_state)
return encoder_model, decoder_model

When I run this code the following error is coming.

ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_5:0", shape=(None, None, 20), dtype=float32) at layer "lstm_4". The following previous layers were accessed without issue: ['lstm_5']

If I exclude an attention block, the model will be form without any errors at all.

model = compile_model(tf.keras.models.load_model(constant.MODEL_PATH, compile=False))

encoder_input = model.input[0]
encoder_output, encoder_h, encoder_c = model.layers[1].output
encoder_state = [encoder_h, encoder_c]
encoder_model = tf.keras.Model(encoder_input, encoder_state)

decoder_input = model.input[1]
decoder = model.layers[3]
decoder_new_h = tf.keras.Input(shape=(n_units,), name='input_3')
decoder_new_c = tf.keras.Input(shape=(n_units,), name='input_4')
decoder_input_initial_state = [decoder_new_h, decoder_new_c]

decoder_output, decoder_h, decoder_c = decoder(decoder_input, initial_state=decoder_input_initial_state)
decoder_output_state = [decoder_h, decoder_c]

# These lines cause an error
# context = model.layers[4]([encoder_output, decoder_output])
# decoder_combined_context = model.layers[5]([context, decoder_output])
# output = model.layers[6](decoder_combined_context)
# output = model.layers[7](output)
# end

decoder_model = tf.keras.Model([decoder_input] + decoder_input_initial_state, [decoder_output] + decoder_output_state)
return encoder_model, decoder_model

Solution

  • I think you also need to take the encoder output as output from the encoder model and then give it as input to the decoder model as the attention part requires it. Maybe this changes could help-

    model = compile_model(tf.keras.models.load_model(constant.MODEL_PATH, compile=False))
    encoder_input = model.input[0]
    encoder_output, encoder_h, encoder_c = model.layers[1].output
    encoder_state = [encoder_h, encoder_c]
    encoder_model = tf.keras.Model(inputs=[encoder_input],outputs=[encoder_state,encoder_output])
    
    decoder_input = model.input[1]
    decoder_input2 = tf.keras.Input(shape=x) #where x is the shape of encoder output
    decoder = model.layers[3]
    decoder_new_h = tf.keras.Input(shape=(n_units,), name='input_3')
    decoder_new_c = tf.keras.Input(shape=(n_units,), name='input_4')
    decoder_input_initial_state = [decoder_new_h, decoder_new_c]
    
    decoder_output, decoder_h, decoder_c = decoder(decoder_input, initial_state=decoder_input_initial_state)
    decoder_output_state = [decoder_h, decoder_c]
    
    context = model.layers[4]([decoder_input2, decoder_output])
    decoder_combined_context = model.layers[5]([context, decoder_output])
    output = model.layers[6](decoder_combined_context)
    output = model.layers[7](output)
    
    decoder_model = tf.keras.Model([decoder_input,decoder_input2,decoder_input_initial_state], [output] + decoder_output_state)`