I have posted this question as an issue in Keras' Github but figured it might reach a broader audience here.
System information
Describe the current behavior
I am executing the code from the Seq2Seq tutorial. The one and only change I made was to swap the LSTM layers for CuDNNLSTM. What happens is that the model predicts a fixed output for any input I give it. When I run the original code, I get sensible results.
Describe the expected behavior
See preceding section.
Code to reproduce the issue
Taken from here. Simply replace LSTM with CuDNNLSTM.
Any insights are greatly appreciated.
So here there are two problems.
Use of CuDNNLSTM
and parameter tuning
.
Basically, the network overfits on your dataset which leads the output being only one sentence for every input. This is neither the fault of CuDNNLSTM
nor LSTM
.
Firstly,
CuDNN
has a bit different maths from regular LSTM
to make it Cuda Compatible and run faster. The LSTM
takes 11 sec to run on eng-hindi file for the same code that you used and CuDNNLSTM
takes 1 sec for each epoch.
In the CuDNNLSTM time_major
param is set to false
. For this reason the network overfits. You can check it here.
You can clearly see for small datasets like eng-hin or eng-marathi the val-loss
increases after 30 epochs. There is no point in running the network more where your network loss
is decreasing and val_loss
is increasing. The case with LSTM
is same too.
Here you need param tuning
for small datasets.
Here are a few links which can help: