Search code examples
neural-networkkerastraining-datarecurrent-neural-networklstm

Keras LSTM training for text generation


I am working on a character level text generator using Keras. In going through examples/tutorials there is something that I still do not understand.

The training data (X) is being split into semi redundant sequences of length maxlen, with y being the character immediately following the sequence.

I understand that this is for efficiency as it means that the training will only realize dependencies within maxlen characters.

I am struggling to understand why it is done in sequences though. I thought LSTM/RNN were trained by inputting characters one at a time and comparing the predicted next character to the actual next character. This seems very different then inputting them say maxlen=50 characters at a time and comparing length 50 sequences to the next character.

Does Keras actually break up the training sequences and input them character by character "under the hood"?

If not why?


Solution

  • Because of sequence generation I'm assuming that you are setting the flag stateful=True in your recurrent layers. Without this option you are making different sequences / characters independent what I think is not the case. If this flag is set to True then both of this approaches are equivalent - and dividing the text into sequences is made for improvement of performance and simplicity reason.