Search code examples
machine-learningtensorflowrecurrent-neural-networklstmcntk

Training RNNs on long sequences


I'm training an LSTM network and I'm looking to understand best practices for training on long sequences, O(1k) length or more. What is a good approach to choosing a minibatch size? How would skew in label prevalence influence that choice? (Positives are rare in my scenario). Is it worthwhile to make an effort to rebalance my data? Thanks.


Solution

  • You probably want to rebalance so they are 50/50. Otherwise it will skew to one class or another.

    As for the batch size I would go as large as will fit in memory.

    I am not sure the LSTMs will be able to learn dependencies on the O(1k) but it is worth a try. You could look into doing something like wavenet if you want ultra long dependencies.

    https://deepmind.com/blog/wavenet-generative-model-raw-audio/