Search code examples
nlppytorchlstmseq2seqencoder-decoder

How have the number of dimensions after LSTM been decided in Pointer Generator Model in PyTorch?


I don't understand why is the number of input and output dimensions 2 * config.hidden_dim while applying a fully connected layer in the encode class (mentioned in the last line)?

class Encoder(nn.Module):
    def __init__(self):
        super(Encoder, self).__init__()
        self.embedding = nn.Embedding(config.vocab_size, config.emb_dim)
        init_wt_normal(self.embedding.weight)

        self.lstm = nn.LSTM(
            config.emb_dim, config.hidden_dim, num_layers=1, 
            batch_first=True, bidirectional=True)
        init_lstm_wt(self.lstm)

        self.W_h = nn.Linear(
            config.hidden_dim * 2, config.hidden_dim * 2, bias=False)

The code has been taken from https://github.com/atulkum/pointer_summarizer/blob/master/training_ptr_gen/model.py Please Explain


Solution

  • The reason is that the LSTM layer bidirectional, i.e., there are in fact two LSTMs each of them processing the input from each direction. They both return vectors of dimension config.hidden_dim which get concatenated into vectors of 2 * config.hidden_dim.