Why does output shape in a simple Elman RNN depend on the sequence length(while hidden state shape doesn't)?

I am learning about RNNs, and am trying to code one up using PyTorch. I have some trouble understanding the output dimensions

Here is some code for a simple RNN architecture

class RNN(nn.Module):
    def __init__(self, input_size, hidden_dim, n_layers):
        super(RNN, self).__init__()
        self.hidden_dim=hidden_dim
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)

    def forward(self, x, hidden):
        r_out, hidden = self.rnn(x, hidden)

        return r_out, hidden

So, what I understand is the hidden_dim is the number blocks I will have in my hidden layer, and essentially the number of features in the output and in the hidden state.

I create some dummy data, to test it

test_rnn = RNN(input_size=1, hidden_dim=4, n_layers=1)

# generate evenly spaced, test data pts
time_steps = np.linspace(0, 6, 3)
data = np.sin(time_steps)
data.resize((3, 1))

test_input = torch.Tensor(data).unsqueeze(0) # give it a batch_size of 1 as first dimension
print('Input size: ', test_input.size())

# test out rnn sizes
test_out, test_h = test_rnn(test_input, None)
print('Hidden state size: ', test_h.size())
print('Output size: ', test_out.size())

What I get is

Input size:  torch.Size([1, 3, 1])
Hidden state size:  torch.Size([1, 1, 4])
Output size:  torch.Size([1, 3, 4])

So I understand that the shape of x is determined like so x = (batch_size, seq_length, input_size).. so 1 bath size, and input of 1 feature and 3 time steps(sequence length).
For hidden state, like so hidden = (n_layers, batch_size, hidden_dim).. so I had 1 layer, batch size 1, and 4 blocks in my hidden layer.
What I don't get is the RNN output. From the documentation, r_out = (batch_size, time_step, hidden_size).. Wasn't the output supposed to be the same as the hidden state that was output from the hidden units? That is, if I have 4 units in my hidden layer, I would expect it to output 4 numbers for the hidden state, and 4 numbers for the output. Why is the time step a dimension of the output? Because, each hidden unit, takes in some numbers, outputs a state S and output Y, and both of these are equal, yes? I tried a diagram, this is what I came up with. Help me understand what part of it I'm doing wrong.

So TL;DR
Why does output shape in a simple Elman RNN depend on the sequence length(while hidden state shape doesn't)? For in the diagram I drew, I see both of them being the same.

Solution

In the PyTorch API, the output is a sequence of hidden states during the RNN computation, i.e., there is one hidden state vector per input vector. The hidden state is the last hidden state, the state the RNN ends with after processing the input, so test_out[:, -1, :] = test_h.

Vector y in your diagrams is the same as a hidden state Ht, it indeed has 4 numbers, but the state is different for every time step, so you have 4 number for every time step.

The reason why PyTorch separates the sequence of outputs = hidden states (it's not the same in LSTMs, though) is that you can have a batch of sequences of different lengths. In that case, the final state is not simply test_out[:, -1, :], because you need to select final states based on the lengths of individual sequences.