neural-network theano deep-learning lasagne

How to properly add and use BatchNormLayer?

Introduction

According to the lasagne docs : "This layer should be inserted between a linear transformation (such as a DenseLayer, or Conv2DLayer) and its nonlinearity. The convenience function batch_norm() modifies an existing layer to insert batch normalization in front of its nonlinearity."

However lasagne also have the utility function :

lasagne.layers.batch_norm

However, due to implementation on my end, i cant use that function.

My Question is : How and Where should i add the BatchNormLayer?

class lasagne.layers.BatchNormLayer(incoming, axes='auto', epsilon=1e-4, alpha=0.1, beta=lasagne.init.Constant(0), gamma=lasagne.init.Constant(1), mean=lasagne.init.Constant(0), inv_std=lasagne.init.Constant(1), **kwargs)

Can i add it after a convolution layer? or should i add after the maxpool? Do i have to manually remove the bias of the layers?

Approach used I have used it like this, only, :

try:
        import lasagne
        import theano
        import theano.tensor as T

        input_var = T.tensor4('inputs')
        target_var = T.fmatrix('targets')

        network = lasagne.layers.InputLayer(shape=(None, 1, height, width), input_var=input_var)

        from lasagne.layers import BatchNormLayer

        network = BatchNormLayer(network,
                                 axes='auto',
                                 epsilon=1e-4,
                                 alpha=0.1,
                                 beta=lasagne.init.Constant(0),
                                 gamma=lasagne.init.Constant(1),
                                 mean=lasagne.init.Constant(0),
                                 inv_std=lasagne.init.Constant(1))

        network = lasagne.layers.Conv2DLayer(
            network, num_filters=60, filter_size=(3, 3), stride=1, pad=2,
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())

        network = lasagne.layers.Conv2DLayer(
            network, num_filters=60, filter_size=(3, 3), stride=1, pad=1,
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())


        network = lasagne.layers.MaxPool2DLayer(incoming=network, pool_size=(2, 2), stride=None, pad=(0, 0),
                                                ignore_border=True)


        network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=0.5),
            num_units=32,
            nonlinearity=lasagne.nonlinearities.rectify)


        network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=0.5),
            num_units=1,
            nonlinearity=lasagne.nonlinearities.sigmoid)


        return network, input_var, target_var

References:

https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/normalization.py#L120-L320

http://lasagne.readthedocs.io/en/latest/modules/layers/normalization.html

Solution

If not using batch_norm:

BatchNormLayer should be added after the dense or convolution layer, before the nonlinearity.
Maxpool is a non-linear downsampling which will keep the highest values on that layer. The sampled values will be normalized if you added BatchNormLayer after or convolution/dense layer.
If not using batch_norm, do remove the bias manually as it's redundant.

Please test the code below and let us know if it works for what you are trying to accomplish. If it does not work, you can try adapting the batch_norm code.

import lasagne
import theano
import theano.tensor as T
from lasagne.layers import batch_norm

input_var = T.tensor4('inputs')
target_var = T.fmatrix('targets')

network = lasagne.layers.InputLayer(shape=(None, 1, height, width), input_var=input_var)

network = lasagne.layers.Conv2DLayer(
    network, num_filters=60, filter_size=(3, 3), stride=1, pad=2,
    nonlinearity=lasagne.nonlinearities.rectify,
    W=lasagne.init.GlorotUniform())
network = batch_norm(network)

network = lasagne.layers.Conv2DLayer(
    network, num_filters=60, filter_size=(3, 3), stride=1, pad=1,
    nonlinearity=lasagne.nonlinearities.rectify,
    W=lasagne.init.GlorotUniform())
network = batch_norm(network)

network = lasagne.layers.MaxPool2DLayer(incoming=network, pool_size=(2, 2), stride=None, pad=(0, 0),
                                        ignore_border=True)

network = lasagne.layers.DenseLayer(
    lasagne.layers.dropout(network, p=0.5),
    num_units=32,
    nonlinearity=lasagne.nonlinearities.rectify)
network = batch_norm(network)

network = lasagne.layers.DenseLayer(
    lasagne.layers.dropout(network, p=0.5),
    num_units=1,
    nonlinearity=lasagne.nonlinearities.sigmoid)
network = batch_norm(network)

When getting the params to create the graph for you update method, remember to set trainable to True:

params = lasagne.layers.get_all_params(l_out, trainable=True)
updates = lasagne.updates.adadelta($YOUR_LOSS_HERE, params)`