Do linear layer after GRU saved the sequence output order?

I'm dealing with the following senario:

My input has the shape of: [batch_size, input_sequence_length, input_features] where:

input_sequence_length = 10

input_features = 3
My output has the shape of: [batch_size, output_sequence_length] where:

output_sequence_length = 5

i.e: for each time slot of 10 units (each slot with 3 features) I need to predict the next 5 slots values.

I built the following model:

import torch
import torch.nn as nn
import torchinfo

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        
        self.GRU = nn.GRU(input_size=3, hidden_size=32, num_layers=2, batch_first=True)
        self.fc  = nn.Linear(32, 5)
        
    def forward(self, input_series):
        
        output, h = self.GRU(input_series)                
        output    = output[:,  -1, :]       # get last state                
        output    = self.fc(output) 
        output    = output.view(-1, 5, 1)   # reorginize output        
        return output
    
torchinfo.summary(MyModel(), (512, 10, 3))  



==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
MyModel                                  [512, 5, 1]               --
├─GRU: 1-1                               [512, 10, 32]             9,888
├─Linear: 1-2                            [512, 5]                  165
==========================================================================================

I'm getting good results (very small MSE loss, and the predictions looks good),

but I'm not sure if the model output (5 sequence values) are really ordered by the model ? i.e the second output based on the first output and the third output based on the second output ...

I know that the GRU output based on the learned sequence history. But I'm also used linear layer, so is the output (after the linear layer) still sorted by time ?

Solution

UPDATE

This answer isn't quite right, see this follow-up question. The best way is to write the math and show that the 5 scalar outputs aren't functions of each other.

Old Answer

I'm not sure if the model output (5 sequence values) are really ordered by the model ? i.e the second output based on the first output and the third output based on the second output

No, they aren't. You can check that the gradients of, say, the last output w.r.t to the previous outputs are zeroes, which basically means that the last output isn't a function of the previous outputs.

model = MyModel()
x = torch.rand([2, 10, 3])
y = model(x)
y.retain_grad()  # allows accessing y.grad although y is a non-leaf Tensor
y[:, -1].sum().backward()  # computes gradients of last output
assert torch.allclose(y.grad[:, :-1], torch.tensor(0.))  # gradients w.r.t previous outputs are zeroes

A popular model to capture dependencies among output labels is conditional random fields. But since you're already happy with the predictions of the current model, perhaps modelling the output dependencies isn't that important.