Search code examples
deep-learningkeraslstmencoderdecoder

Keras Seq2Seq Introduction


A Keras introduction to Seq2Seq model have been published a few weeks ago that can be found here. I do not really understand one part of this code:

decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _= decoder_lstm(decoder_inputs,initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

Here the decoder_lstm is defined. It is a layer of dimension latent_dim. We use the states of the encoder as initial_state for the decoder.

What I do not understand is why a dense layer is then added after the LSTM layer and why it is working? The decoder is supposed to return all the sequence because of return_sequences = True, so how is it possible that adding a dense layer after is working?

I guess I miss something here.


Solution

  • Although the common cases use 2D data (batch,dim) as inputs for dense layers, in newer versions of Keras you can use 3D data (batch,timesteps,dim).

    If you don't flatten this 3D data, your Dense layer will behave as if it would be applied to each of the time steps. And you will get outputs like (batch,timesteps,dense_units)

    You can check these two little models below and confirm that independently of the time steps, both Dense layers have the same number of parameters, showing its parameters are suited only for the last dimension.

    from keras.layers import *
    from keras.models import Model
    import keras.backend as K
    
    #model with time steps    
    inp = Input((7,12))
    out = Dense(5)(inp)
    model = Model(inp,out)
    model.summary()
    
    #model without time steps
    inp2 = Input((12,))
    out2 = Dense(5)(inp2)
    model2 = Model(inp2,out2)
    model2.summary()
    

    The result will show 65 (12*5 + 5) parameters in both cases.