I have a simple rnn code below.
rnn = nn.RNN(1, 1, 1, bias = False, batch_first = True)
t = torch.ones(size = (1, 2, 1))
output, hidden = rnn(t)
print(rnn.weight_ih_l0)
print(rnn.weight_hh_l0)
print(output)
print(hidden)
# Outputs
Parameter containing:
tensor([[0.7199]], requires_grad=True)
Parameter containing:
tensor([[0.4698]], requires_grad=True)
tensor([[[0.6168],
[0.7656]]], grad_fn=<TransposeBackward1>)
tensor([[[0.7656]]], grad_fn=<StackBackward>)
tensor([[[0.7656]]], grad_fn=<StackBackward>)
My understanding from the PyTorch documentation is that the output from above is the hidden state.
So, I tried to manually calculate the output using the below
hidden_state1 = torch.tanh(t[0][0] * rnn.weight_ih_l0)
print(hidden_state1)
hidden_state2 = torch.tanh(t[0][1] * rnn.weight_ih_l0 + hidden_state1 * rnn.weight_hh_l0)
print(hidden_state2)
tensor([[0.6168]], grad_fn=<TanhBackward>)
tensor([[0.7656]], grad_fn=<TanhBackward>)
The result was correct. hidden_state1 and hidden_state2 match the output.
Shouldn’t the hidden_states get multiplied with output weights to get the output?
I checked for weights connecting from hidden state to output. But there are no weights at all.
If the objective of rnn is to calculate only hidden states, Could anyone tell me how to get the output?
Shouldn’t the hidden_states get multiplied with output weights to get the output
Yes and No. It depends on your problem formulation. Suppose you are dealing with a case where output from last timestep only matters. In that case it really doesn't make sense to multiply hidden state to output weight in each unit. That's why pytorch only gives you hidden output as an abstract value, after that you can really go wild and do whatever you want with hidden states according to your problem.
In your particular case suppose you want to apply another linear layer to the output at each timestep. You can do so simply by defining a linear layer and propagating the output of hidden unit.
#Linear Layer
##hidden_feature_size = 1 in your case
lin_layer = nn.Linear(hidden_feature_size, output_feature_size)
#output from first timestep
linear_layer(output[0])
#output from second timestep
linear_layer(output[1])