Search code examples
pythonkerasseq2seqtemporal

LSTM seq2seq input and output with different number of time steps


I am new to this field and currently working on a video action prediction project using keras. The input data takes 10% frames of each video and convert all same successive actions into 1 single action. For example [0,0,0,1,1,1,2] -> [0,1,2]. After applying padding and one-hot encoding, the shape of the input data is (1460, 6, 48) -> (number of videos, number of actions, one-hot encoded form for 48 actions). I would like to predict all future actions for each video. The shape of the output should be (1460, 23, 48) -> (number of videos, max timesteps, one-hot encoded form for 48 actions).

Here is my current approach, which does not work.

def lstm_model(frame_len, max_timesteps):

    model = Sequential()
    model.add(LSTM(100, input_shape=(None,48), return_sequences=True))
    model.add(Dense(48, activation='tanh'))
    model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
    model.summary()
    return model

Image1

Image2

I would like to know if I have to keep the number of timesteps the same for input and output. If not, how could I modify the model to fit such data.

Any help would be appreciated.


Solution

  • You can do someting like this :

    1. Encode your input data with LSTM
    2. Copy the required number of time this encoded vector
    3. Decode the encoded vector

    In keras, it looks like :

    from tensorflow.keras import layers,models
    
    input_timesteps=10
    input_features=2
    output_timesteps=3
    output_features=1
    units=100
    
    #Input
    encoder_inputs = layers.Input(shape=(input_timesteps,input_features))
    
    #Encoder
    encoder = layers.LSTM(units, return_sequences=False)(encoder_inputs)
    
    #Repeat
    decoder = layers.RepeatVector(output_timesteps)(encoder)
    
    #Decoder
    decoder = layers.LSTM(units, return_sequences=True)(decoder)
    
    #Output
    out = layers.TimeDistributed(Dense(output_features))(decoder)
    
    model = models.Model(encoder_inputs, out)
    

    it gives you:

    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    input_1 (InputLayer)         [(None, 10, 2)]           0         
    _________________________________________________________________
    lstm (LSTM)                  (None, 100)               41200     
    _________________________________________________________________
    repeat_vector (RepeatVector) (None, 3, 100)            0         
    _________________________________________________________________
    lstm_1 (LSTM)                (None, 3, 100)            80400     
    _________________________________________________________________
    time_distributed (TimeDistri (None, 3, 1)              101       
    =================================================================
    

    if you want to keep the cell state from the encoder to re use in the decoder, you can do it with return_state=True. Check this question.