I'm learning pytorch and
I'm wondering what does the padding_idx
attribute do in torch.nn.Embedding(n1, d1, padding_idx=0)
?
I have looked everywhere and couldn't find something I can get.
Can you show example to illustrate this?
As per the docs, padding_idx
pads the output with the embedding vector at padding_idx
(initialized to zeros) whenever it encounters the index.
What this means is that wherever you have an item equal to padding_idx
, the output of the embedding layer at that index will be all zeros.
Here is an example:
Let us say you have word embeddings of 1000 words, each 50-dimensional ie num_embeddingss=1000
, embedding_dim=50
. Then torch.nn.Embedding
works like a lookup table (lookup table is trainable though):
emb_layer = torch.nn.Embedding(1000,50)
x = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
y = emb_layer(x)
y
will be a tensor of shape 2x4x50. I hope this part is clear to you.
Now if I specify padding_idx=2
, ie
emb_layer = torch.nn.Embedding(1000,50, padding_idx=2)
x = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
y = emb_layer(x)
then output will still be 2x4x50 but the 50-dim vector at (1,2) and (2,3) will be all zeros since x[1,2]
and x[2,3]
values are 2 which is equal to the padding_idx
.
You can think of it as 3rd word in the lookup table (since lookup table would be 0-indexed) is not being used for training.