tensorflow deep-learning image-segmentation batch-normalization

Tensorflow for image segmentation: Batch normalization has worst performance

I'm using TensorFlow for a multi-target regression problem. Specifically, in a fully convolutional residual network for pixel-wise labeling with the input being an image and the label a mask. In my case I am using brain MR as images and the labels are mask of the tumors.

I have accomplish a fairly decent result using my net:

Although I am sure there is still room for improvement. Therefore, I wanted to add batch normalization. I implemented it as follows:

# Convolutional Layer 1
Z10 = tf.nn.conv2d(X, W_conv10, strides = [1, 1, 1, 1], padding='SAME')
Z10 = tf.contrib.layers.batch_norm(Z10, center=True, scale=True, is_training = train_flag)
A10 = tf.nn.relu(Z10)
Z1 = tf.nn.conv2d(Z10, W_conv1, strides = [1, 2, 2, 1], padding='SAME')
Z1 = tf.contrib.layers.batch_norm(Z1, center=True, scale=True, is_training = train_flag)
A1 = tf.nn.relu(Z1)

for each the conv and transpose layers of my net. But the results are not what I expected. the net with batch normalization has a terrible performance. In orange is the loss of the net without batch normalization while the blue has it:

Not only the net is learning slower, the predicted labels are also very bad in the net using batch normalization.

Does any one know why this might be the case? Could it be my cost function? I am currently using

loss = tf.nn.sigmoid_cross_entropy_with_logits(logits = dA1, labels = Y) cost = tf.reduce_mean(loss)

Solution

Batch normalization is a terrible normalization choice for tasks related to semantic information being passed through the network. Look into conditional normalization methods - Adaptive Instance Normalization, etc to understand my point. Also, this paper - https://arxiv.org/abs/1903.07291. Batch normalization washes away all the semantic information of the network.