When should I use one or another? Tutorials and examples use either Sequential([Stabilizer(), Recurrence(LSTM(hidden_dim))])
or LSTMP_component_with_self_stabilization
from Examples/common/nn.py. I've tried replacing the former with Recurrence(LSTM(hidden_dim, enable_self_stabilization=True))
in the char_rnn.py example, but the results are significantly worse.
The Stabilizer
layer multiplies its input with a learnable scalar. This simple trick has been shown to significantly improve convergence and stability. It has some similarity with BatchNormalization
. Generally, when you can use BatchNormalization
, you should try that first. Where that is not possible, which is specifically inside recurrent loops, I recommend to use Stabilizer
instead.
Normally, you must inject it explicitly in your model. A special case are the recurrent step functions (e.g. LSTM
), which include Stabilizer
s inside. Use enable_self_stabilization=True
to enable that. Those built-in Stabilizer
s only apply to internal variables. For the main input, you must insert a Stabilizer
yourself.
If you include explicit Stabilizer
s but set enable_self_stabilization=False
(e.g. as a default_option
), then those explicit Stabilizer
s are no-ops.
It is not my experience that Stabilizer
makes things worse. It is generally a sure-fire thing to improve convergence. It does change numeric ranges, though. So if it makes convergence worse, I suggest to experiment with different hyper-parameter settings, e.g. reduce the learning rate.