I used torchtext vocab to convert the text to index, but which function should I use to make all the index list be the same length before I send them to the net?
For example I have 2 texts:
I am a good man
I would like a coffee please
After vocab:
[1, 3, 2, 5, 7]
[1, 9, 6, 2, 4, 8]
And what I want is:
[1, 3, 2, 5, 7, 0]
[1, 9, 6, 2, 4, 8]
It is easy to understand by looking at the following example.
Code:
import torch
v = [
[0,2],
[0,1,2],
[3,3,3,3]
]
torch.nn.utils.rnn.pad_sequence([torch.tensor(p) for p in v], batch_first=True)
Result:
tensor([[0, 2, 0, 0],
[0, 1, 2, 0],
[3, 3, 3, 3]])