I have the following architecture:
Conv1
Relu1
Pooling1
Conv2
Relu2
Pooling3
FullyConnect1
FullyConnect2
My question is, where do I apply batch normalization? And what would be the best function to do this in TensorFlow?
The original batch-norm paper prescribes using the batch-norm before ReLU activation. But there is evidence that it's probably better to use batchnorm after the activation. Here's a comment on Keras GitHub by Francois Chollet:
... I can guarantee that recent code written by Christian [Szegedy] applies relu before BN. It is still occasionally a topic of debate, though.
To your second question: in tensorflow, you can use a high-level tf.layers.batch_normalization
function, or a low-level tf.nn.batch_normalization
.