LSTM seq2seq input and output with different number of time steps

I am new to this field and currently working on a video action prediction project using keras. The input data takes 10% frames of each video and convert all same successive actions into 1 single action. For example [0,0,0,1,1,1,2] -> [0,1,2]. After applying padding and one-hot encoding, the shape of the input data is (1460, 6, 48) -> (number of videos, number of actions, one-hot encoded form for 48 actions). I would like to predict all future actions for each video. The shape of the output should be (1460, 23, 48) -> (number of videos, max timesteps, one-hot encoded form for 48 actions).

Here is my current approach, which does not work.

def lstm_model(frame_len, max_timesteps):

    model = Sequential()
    model.add(LSTM(100, input_shape=(None,48), return_sequences=True))
    model.add(Dense(48, activation='tanh'))
    model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
    model.summary()
    return model

I would like to know if I have to keep the number of timesteps the same for input and output. If not, how could I modify the model to fit such data.

Any help would be appreciated.

Solution

You can do someting like this :

Encode your input data with LSTM
Copy the required number of time this encoded vector
Decode the encoded vector

In keras, it looks like :

from tensorflow.keras import layers,models

input_timesteps=10
input_features=2
output_timesteps=3
output_features=1
units=100

#Input
encoder_inputs = layers.Input(shape=(input_timesteps,input_features))

#Encoder
encoder = layers.LSTM(units, return_sequences=False)(encoder_inputs)

#Repeat
decoder = layers.RepeatVector(output_timesteps)(encoder)

#Decoder
decoder = layers.LSTM(units, return_sequences=True)(decoder)

#Output
out = layers.TimeDistributed(Dense(output_features))(decoder)

model = models.Model(encoder_inputs, out)

it gives you:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 10, 2)]           0         
_________________________________________________________________
lstm (LSTM)                  (None, 100)               41200     
_________________________________________________________________
repeat_vector (RepeatVector) (None, 3, 100)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 3, 100)            80400     
_________________________________________________________________
time_distributed (TimeDistri (None, 3, 1)              101       
=================================================================

if you want to keep the cell state from the encoder to re use in the decoder, you can do it with return_state=True. Check this question.