I'm running a code by using pad_to_max_length = True
and everything works fine. Only I get a warning as follow:
FutureWarning: The
pad_to_max_length
argument is deprecated and will be removed in a future version, usepadding=True
orpadding='longest'
to pad to the longest sequence in the batch, or usepadding='max_length'
to pad to a max length. In this case, you can give a specific length withmax_length
(e.g.max_length=45
) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
But when I change pad_to_max_length = True
to padding='max_length'
I get this error:
RuntimeError: stack expects each tensor to be equal size, but got [60] at entry 0 and [64] at entry 6
How can I change the code to the new version? Is there anything I got wrong with the warning documentation?
This is my encoder:
encoding = self.tokenizer.encode_plus(
poem,
add_special_tokens=True,
max_length= 60,
return_token_type_ids=False,
pad_to_max_length = True,
return_attention_mask=True,
return_tensors='pt',
)
It seems that the documentation is not complete enough!
You should add truncation=True
too to memic the pad_to_max_length = True
.
like this:
encoding = self.tokenizer.encode_plus(
poem,
add_special_tokens=True,
max_length=self.max_len,
return_token_type_ids=False,
padding='max_length',
truncation=True,
return_attention_mask=True,
return_tensors='pt',
)