I am new to this field and currently working on a video action prediction project using keras. The input data takes 10% frames of each video and convert all same successive actions into 1 single action. For example [0,0,0,1,1,1,2] -> [0,1,2]. After applying padding and one-hot encoding, the shape of the input data is (1460, 6, 48) -> (number of videos, number of actions, one-hot encoded form for 48 actions). I would like to predict all future actions for each video. The shape of the output should be (1460, 23, 48) -> (number of videos, max timesteps, one-hot encoded form for 48 actions).
Here is my current approach, which does not work.
def lstm_model(frame_len, max_timesteps):
model = Sequential()
model.add(LSTM(100, input_shape=(None,48), return_sequences=True))
model.add(Dense(48, activation='tanh'))
model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
model.summary()
return model
I would like to know if I have to keep the number of timesteps the same for input and output. If not, how could I modify the model to fit such data.
Any help would be appreciated.
You can do someting like this :
In keras, it looks like :
from tensorflow.keras import layers,models
input_timesteps=10
input_features=2
output_timesteps=3
output_features=1
units=100
#Input
encoder_inputs = layers.Input(shape=(input_timesteps,input_features))
#Encoder
encoder = layers.LSTM(units, return_sequences=False)(encoder_inputs)
#Repeat
decoder = layers.RepeatVector(output_timesteps)(encoder)
#Decoder
decoder = layers.LSTM(units, return_sequences=True)(decoder)
#Output
out = layers.TimeDistributed(Dense(output_features))(decoder)
model = models.Model(encoder_inputs, out)
it gives you:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 10, 2)] 0
_________________________________________________________________
lstm (LSTM) (None, 100) 41200
_________________________________________________________________
repeat_vector (RepeatVector) (None, 3, 100) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 3, 100) 80400
_________________________________________________________________
time_distributed (TimeDistri (None, 3, 1) 101
=================================================================
if you want to keep the cell state from the encoder to re use in the decoder, you can do it with return_state=True
. Check this question.