Search code examples
pythonpytorchlstm

Trying to understand LSTM parameter hidden_size in PyTorch


Going of LSTM documentation: https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM

I will be referring to this picture for the rest of my post:

LSTM block

I think I understand what input_size is:

input_size - the number of input features per time-step: So if the input is a vector, maybe a word embedding. (Represented as X_t in the baove piture)

What is is hidden_size? According to this stack overflow post:

https://stackoverflow.com/a/50848068/14382150

hidden_size is referred to as number of nodes in 1 LSTM cell. BUt he does not explain very well what he means by node.

Can someone please give a clear explanation of what hidden_size is referring to? Is it just the number of time steps the LSTM block unrolled?

Also what size should it be? Is it like a hyperparameter, similar to defining number of neurons in a fully connected neural network.

I have looked at multiple posts and videos but I cant seem to understand this. Any help is appreciated


Solution

  • The hidden_size is a hyper-parameter and it refers to the dimensionality of the vector h_t. It has nothing to do with the number of LSTM blocks, which is another hyper-parameter (num_layers). It is also explained by the user in the other post you linked.

    Since it is a hyper-parameter, what its value should be needs to be found empirically for the particular task at hand.

    If you pick your hidden_size equal to H and the input size is X, notice that parameter matrices W_ii, W_if, W_ig, W_io will be HxX matrices, whereas W_hi, W_hf, W_hg and W_ho will be matrices of size HxH. In practice, these 8 matrices (4 HxX matrices and 4 HxH matrices) are implemented as a two 4HxX and 4HxH matrices in pytorch.

    enter image description here

    The 4H parameters you see here are the bias terms.