Search code examples
cntk

What's the difference between Stabilizer() block and enable_self_stabilization parameter?


When should I use one or another? Tutorials and examples use either Sequential([Stabilizer(), Recurrence(LSTM(hidden_dim))]) or LSTMP_component_with_self_stabilization from Examples/common/nn.py. I've tried replacing the former with Recurrence(LSTM(hidden_dim, enable_self_stabilization=True)) in the char_rnn.py example, but the results are significantly worse.


Solution

  • The Stabilizer layer multiplies its input with a learnable scalar. This simple trick has been shown to significantly improve convergence and stability. It has some similarity with BatchNormalization. Generally, when you can use BatchNormalization, you should try that first. Where that is not possible, which is specifically inside recurrent loops, I recommend to use Stabilizer instead.

    Normally, you must inject it explicitly in your model. A special case are the recurrent step functions (e.g. LSTM), which include Stabilizers inside. Use enable_self_stabilization=True to enable that. Those built-in Stabilizers only apply to internal variables. For the main input, you must insert a Stabilizer yourself.

    If you include explicit Stabilizers but set enable_self_stabilization=False (e.g. as a default_option), then those explicit Stabilizers are no-ops.

    It is not my experience that Stabilizer makes things worse. It is generally a sure-fire thing to improve convergence. It does change numeric ranges, though. So if it makes convergence worse, I suggest to experiment with different hyper-parameter settings, e.g. reduce the learning rate.