Search code examples
pythonkerasneural-networkbatch-normalization

Where do I call the BatchNormalization function in Keras?


If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning?

I read this documentation for it: http://keras.io/layers/normalization/

I don't see where I'm supposed to call it. Below is my code attempting to use it:

model = Sequential()
keras.layers.normalization.BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None)
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(2, init='uniform'))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)

I ask because if I run the code with the second line including the batch normalization and if I run the code without the second line I get similar outputs. So either I'm not calling the function in the right place, or I guess it doesn't make that much of a difference.


Solution

  • As Pavel said, Batch Normalization is just another layer, so you can use it as such to create your desired network architecture.

    The general use case is to use BN between the linear and non-linear layers in your network, because it normalizes the input to your activation function, so that you're centered in the linear section of the activation function (such as Sigmoid). There's a small discussion of it here

    In your case above, this might look like:

    # import BatchNormalization
    from keras.layers.normalization import BatchNormalization
    
    # instantiate model
    model = Sequential()
    
    # we can think of this chunk as the input layer
    model.add(Dense(64, input_dim=14, init='uniform'))
    model.add(BatchNormalization())
    model.add(Activation('tanh'))
    model.add(Dropout(0.5))
    
    # we can think of this chunk as the hidden layer    
    model.add(Dense(64, init='uniform'))
    model.add(BatchNormalization())
    model.add(Activation('tanh'))
    model.add(Dropout(0.5))
    
    # we can think of this chunk as the output layer
    model.add(Dense(2, init='uniform'))
    model.add(BatchNormalization())
    model.add(Activation('softmax'))
    
    # setting up the optimization of our weights 
    sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss='binary_crossentropy', optimizer=sgd)
    
    # running the fitting
    model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)