I'm dealing with the following senario:
My input has the shape of: [batch_size, input_sequence_length, input_features]
where:
input_sequence_length = 10
input_features = 3
My output has the shape of: [batch_size, output_sequence_length]
where:
output_sequence_length = 5
i.e: for each time slot of 10 units (each slot with 3 features) I need to predict the next 5 slots values.
I built the following model:
import torch
import torch.nn as nn
import torchinfo
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.GRU = nn.GRU(input_size=3, hidden_size=32, num_layers=2, batch_first=True)
self.fc = nn.Linear(32, 5)
def forward(self, input_series):
output, h = self.GRU(input_series)
output = output[:, -1, :] # get last state
output = self.fc(output)
output = output.view(-1, 5, 1) # reorginize output
return output
torchinfo.summary(MyModel(), (512, 10, 3))
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
MyModel [512, 5, 1] --
├─GRU: 1-1 [512, 10, 32] 9,888
├─Linear: 1-2 [512, 5] 165
==========================================================================================
I'm getting good results (very small MSE
loss, and the predictions looks good),
but I'm not sure if the model output (5 sequence values) are really ordered by the model ? i.e the second output based on the first output and the third output based on the second output ...
I know that the GRU
output based on the learned sequence history.
But I'm also used linear layer, so is the output (after the linear layer) still sorted by time ?
This answer isn't quite right, see this follow-up question. The best way is to write the math and show that the 5 scalar outputs aren't functions of each other.
I'm not sure if the model output (5 sequence values) are really ordered by the model ? i.e the second output based on the first output and the third output based on the second output
No, they aren't. You can check that the gradients of, say, the last output w.r.t to the previous outputs are zeroes, which basically means that the last output isn't a function of the previous outputs.
model = MyModel()
x = torch.rand([2, 10, 3])
y = model(x)
y.retain_grad() # allows accessing y.grad although y is a non-leaf Tensor
y[:, -1].sum().backward() # computes gradients of last output
assert torch.allclose(y.grad[:, :-1], torch.tensor(0.)) # gradients w.r.t previous outputs are zeroes
A popular model to capture dependencies among output labels is conditional random fields. But since you're already happy with the predictions of the current model, perhaps modelling the output dependencies isn't that important.