Search code examples
deep-learningnlppytorchnlg

Deep Learning methods for Text Generation (PyTorch)


Greetings to everyone,

I want to design a system that is able to generate stories or poetry based on a large dataset of text, without being needed to feed a text description/start/summary as input at inference time.

So far I did this using RNN's, but as you know they have a lot of flaws. My question is, what are the best methods to achieve this task at the time? I searched for possibilities using Attention mechanisms, but it turns out that they are fitted for translation tasks.

I know about GPT-2, Bert, Transformer, etc., but all of them need a text description as input, before the generation and this is not what I'm seeking. I want a system able to generate stories from scratch after training.

Thanks a lot!


Solution

  • edit

    so the comment was: I want to generate text from scratch, not starting from a given sentence at inference time. I hope it makes sense.

    yes, you can do that, that's just simple code manipulation on top of the ready models, be it BERT, GPT-2 or LSTM based RNN.

    How? You have to provide random input to the model. Such random input can be randomly chosen word or phrase or just a vector of zeroes.

    Hope it helps.


    You have mixed up several things here.

    You can achieve what you want either using LSTM based or transformer based architecture.

    When you said you did it with RNN, you probably mean that you have tried LSTM based sequence to sequence model.

    Now, there is attention in your question. So you can use attention to improve your RNN but it is not a required condition. However, if you use transformer architecture, then it is built in the transormer blocks.

    GPT-2 is nothing but a transformer based model. Its building block is a transformer architecture.

    BERT is also another transformer based architecture.

    So to answer your question, you should and can try using LSTM based or transformer based architecture to achieve what you want. Sometimes such architecture is called GPT-2, sometimes BERT depending on how it is realized.

    I encourage you to read this classic from Karpathy, if you understand it then you have cleared most of your questions:

    http://karpathy.github.io/2015/05/21/rnn-effectiveness/