Search code examples
pythonkeras-layerautoencoderseq2seq

Specifying a seq2seq autoencoder. What does RepeatVector do? And what is the effect of batch learning on predicting output?


I am building a basic seq2seq autoencoder, but I'm not sure if I'm doing it correctly.

model = Sequential()
# Encoder       
model.add(LSTM(32, activation='relu', input_shape =(timesteps, n_features ), return_sequences=True))
model.add(LSTM(16, activation='relu', return_sequences=False))
model.add(RepeatVector(timesteps))
# Decoder
model.add(LSTM(16, activation='relu', return_sequences=True))
model.add(LSTM(32, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))'''

The model is then fit using a batch size parameter

model.fit(data, data,       
          epochs=30, 
          batch_size = 32)

The model is compiled with the mse loss function and seems to learn.

To get the encoder output for the test data, I am using a K function:

get_encoder_output = K.function([model.layers[0].input],
                                  [model.layers[1].output])

encoder_output = get_encoder_output([test_data])[0]

My first question is whether the model is specified correctly. In particular whether the RepeatVector layer is needed. I'm not sure what it is doing. What if I omit it and specify the preceding layer with return_sequences = True?

My second question is whether I need to tell get_encoder_output about the batch_size used in training?

Thanks in advance for any help on either question.


Solution

  • This might prove useful to you:

    As a toy problem I created a seq2seq model for predicting the continuation of different sine waves.

    This was the model:

    def create_seq2seq():
        features_num=5 
        latent_dim=40
    
        ##
        encoder_inputs = Input(shape=(None, features_num))
        encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoder_inputs)
        encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
        encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
        encoded = LSTM(latent_dim, return_state=True)(encoded)
    
        encoder = Model (input=encoder_inputs, output=encoded)
        ##
    
        encoder_outputs, state_h, state_c = encoder(encoder_inputs)
        encoder_states = [state_h, state_c]
    
        decoder_inputs=Input(shape=(1, features_num))
        decoder_lstm_1 = LSTM(latent_dim, return_sequences=True, return_state=True)
        decoder_lstm_2 = LSTM(latent_dim, return_sequences=True, return_state=True)
        decoder_lstm_3 = LSTM(latent_dim, return_sequences=True, return_state=True)
        decoder_lstm_4 = LSTM(latent_dim, return_sequences=True, return_state=True)
    
        decoder_dense = Dense(features_num)
    
        all_outputs = []
        inputs = decoder_inputs
    
    
        states_1=encoder_states
        # Placeholder values:
        states_2=states_1; states_3=states_1; states_4=states_1
        ###
    
        for _ in range(1):
            # Run the decoder on the first timestep
            outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
            outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1)
            outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2)
            outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3)
    
            # Store the current prediction (we will concatenate all predictions later)
            outputs = decoder_dense(outputs_4)
            all_outputs.append(outputs)
            # Reinject the outputs as inputs for the next loop iteration
            # as well as update the states
            inputs = outputs
            states_1 = [state_h_1, state_c_1]
            states_2 = [state_h_2, state_c_2]
            states_3 = [state_h_3, state_c_3]
            states_4 = [state_h_4, state_c_4]
    
    
        for _ in range(149):
            # Run the decoder on each timestep
            outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
            outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1, initial_state=states_2)
            outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2, initial_state=states_3)
            outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3, initial_state=states_4)
    
            # Store the current prediction (we will concatenate all predictions later)
            outputs = decoder_dense(outputs_4)
            all_outputs.append(outputs)
            # Reinject the outputs as inputs for the next loop iteration
            # as well as update the states
            inputs = outputs
            states_1 = [state_h_1, state_c_1]
            states_2 = [state_h_2, state_c_2]
            states_3 = [state_h_3, state_c_3]
            states_4 = [state_h_4, state_c_4]
    
    
        # Concatenate all predictions
        decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)   
    
        model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
    
        #model = load_model('pre_model.h5')
    
    
        print(model.summary()
        return (model)