Search code examples
pythondeep-learningnlppytorchrecurrent-neural-network

what does padding_idx do in nn.embeddings()


I'm learning pytorch and I'm wondering what does the padding_idx attribute do in torch.nn.Embedding(n1, d1, padding_idx=0)? I have looked everywhere and couldn't find something I can get. Can you show example to illustrate this?


Solution

  • As per the docs, padding_idx pads the output with the embedding vector at padding_idx (initialized to zeros) whenever it encounters the index.

    What this means is that wherever you have an item equal to padding_idx, the output of the embedding layer at that index will be all zeros.

    Here is an example: Let us say you have word embeddings of 1000 words, each 50-dimensional ie num_embeddingss=1000, embedding_dim=50. Then torch.nn.Embedding works like a lookup table (lookup table is trainable though):

    emb_layer = torch.nn.Embedding(1000,50)
    x = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
    y = emb_layer(x)
    

    y will be a tensor of shape 2x4x50. I hope this part is clear to you.

    Now if I specify padding_idx=2, ie

    emb_layer = torch.nn.Embedding(1000,50, padding_idx=2)
    x = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
    y = emb_layer(x)
    

    then output will still be 2x4x50 but the 50-dim vector at (1,2) and (2,3) will be all zeros since x[1,2] and x[2,3] values are 2 which is equal to the padding_idx. You can think of it as 3rd word in the lookup table (since lookup table would be 0-indexed) is not being used for training.