Search code examples
pytorchtorchtext

torchtext data build_vocab / data_field


I want to ask you some about torchtext.

I have a task about abstractive text summarization, and I build a seq2seq model with pytorch.

I just wonder about data_field constructed by build_vocab function in torchtext.

In machine translation, i accept that two data_fields(input, output) are needed.

But, in summarization, input data and output data are same language.

Here, should I make two data_field(full_sentence, abstract_sentence) in here?

Or is it okay to use only one data_field?

I'm afraid that my wrong choice make model's performance down.

Please, give me a hint.


Solution

  • You are right in the case of summarization and other tasks, it makes sense to build and use the same vocab for input and output