Search code examples
pythonneural-networkkeraschatbotembedding

Keras seq2seq - word embedding


I am working on a generative chatbot based on seq2seq in Keras. I used code from this site: https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/

My models looks like this:

# define training encoder
encoder_inputs = Input(shape=(None, n_input))
encoder = LSTM(n_units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]

# define training decoder
decoder_inputs = Input(shape=(None, n_output))
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(n_output, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# define inference encoder
encoder_model = Model(encoder_inputs, encoder_states)

# define inference decoder
decoder_state_input_h = Input(shape=(n_units,))
decoder_state_input_c = Input(shape=(n_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs [decoder_outputs] + decoder_states)

This neural network is designed to work with one hot encoded vectors, and input to this network seems for example like this:

[[[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]
  [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]]
  [[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
   0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
   0. 0. 0. 0. 0.]]]

How can I rebuild these models to work with words? I would like to use word embedding layer, but I have no idea how to connect embedding layer to these models.

My input should be [[1,5,6,7,4], [4,5,7,5,4], [7,5,4,2,1]] where int numbers are representations of words.

I tried everything but I'm still getting errors. Can you help me, please?


Solution

  • I finally done it. Here is the code:

    Shared_Embedding = Embedding(output_dim=embedding, input_dim=vocab_size, name="Embedding")
    
    encoder_inputs = Input(shape=(sentenceLength,), name="Encoder_input")
    encoder = LSTM(n_units, return_state=True, name='Encoder_lstm') 
    word_embedding_context = Shared_Embedding(encoder_inputs) 
    encoder_outputs, state_h, state_c = encoder(word_embedding_context) 
    encoder_states = [state_h, state_c] 
    decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True, name="Decoder_lstm")
    
    decoder_inputs = Input(shape=(sentenceLength,), name="Decoder_input")
    word_embedding_answer = Shared_Embedding(decoder_inputs) 
    decoder_outputs, _, _ = decoder_lstm(word_embedding_answer, initial_state=encoder_states) 
    decoder_dense = Dense(vocab_size, activation='softmax', name="Dense_layer") 
    decoder_outputs = decoder_dense(decoder_outputs) 
    
    model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
    
    encoder_model = Model(encoder_inputs, encoder_states) 
    
    decoder_state_input_h = Input(shape=(n_units,), name="H_state_input") 
    decoder_state_input_c = Input(shape=(n_units,), name="C_state_input") 
    decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c] 
    decoder_outputs, state_h, state_c = decoder_lstm(word_embedding_answer, initial_state=decoder_states_inputs) 
    decoder_states = [state_h, state_c] 
    decoder_outputs = decoder_dense(decoder_outputs)
    
    decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
    

    "model" is training model encoder_model and decoder_model are inference models