I am trying to boost the performance of a object detection task with sequential information, using ConvLSTM.
A typical ConvLSTM model takes a 5D tensor with shape (samples, time_steps, channels, rows, cols)
as input.
as stated in this post, a long sequence of 500 images need to be split into smaller fragments in the Pytorch ConvLSTM layer. For example, it could be split into 10 fragements with each having 50 time steps.
I have two goals:
I want the network to remember the state across the 10 fragment sequences. I.e. how to pass the hidden state between the fragements?
I want to feed in the images (of the video) one by one. I.e. the long sequence of 500 images is split into 500 fragments with each one having only one image. The input should be like (all_samples, channels, rows, cols)
. This only make sense if the 1.goal could be achieved.
I found some good answers for Tensorflow, but I am using Pytorch.
TensorFlow: Remember LSTM state for next batch (stateful LSTM)
The best way to pass the LSTM state between batches
What is the best way to implement stateful LSTM/ConvLSTM in Pytorch?
I found this post has a good example
model = nn.LSTM(input_size = 20, hidden_size = h_size)
out1, (h1,c1) = model(x1)
out2, (h2,c2) = model(x2, (h1,c1))