keras deep-learning batch-normalization resnet

The BatchNorm Layer in Stage-1 of my ResNet is connected to all the other BatchNorm layers. Why?

Here I have given some screenshots of the ResNet model I implemented. Graphs generated using TensorBoard.

Is it some kind of optimization that tensorflow does in the backend?

I have implemented the code using Keras.

There are two blocks in the model. IdentityBlock and ConvolutionalBlock. Adding the code of these blocks is causing a problem in StackOverflow(Your post is mostly code)

In the ResNet Function (def ResNet) I have used the BatchNormalization and gave it the name 'bnl_stg-1' to which i have passed only one input (X). But for some reason it connects to all the BatchNorm Layers in the Identity and Convolution Blocks as shown in the images.

Here is the code:

def ResNet(input_shape, features):

'''
Implements the ResNet50 Model
[Conv2D -> BatchNorm -> ReLU -> MaxPool2D] --> [ConvBlock -> IdentityBlock * 2] --> [ConvBlock -> IdentityBlock * 3] --> [AveragePool2D -> Flatten -> Dense -> Sigmoid]
'''   
X_input = Input(input_shape)

X = ZeroPadding2D((3, 3))(X_input)

# Stage 1
X = Conv2D(filters = 64, 
           kernel_size = (7, 7), 
           strides = (2, 2), 
           name = 'cnl_stg-1', 
           kernel_initializer = 'glorot_uniform')(X)

X = BatchNormalization(axis = 3, 
                       name = 'bnl_stg-1')(X)

X = Activation('relu')(X)

X = MaxPooling2D(pool_size=(3, 3), 
                 strides=(2, 2))(X)

# Stage 2
X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, s = 1)
X = identity_block(X, 3, [64, 64, 256], stage=2, block=1)
X = identity_block(X, 3, [64, 64, 256], stage=2, block=2)

# Stage 3 
X = convolutional_block(X, f = 3, filters = [128, 128, 512], stage = 3, s = 2)
X = identity_block(X, 3, [128, 128, 512], stage = 3, block = 1)
X = identity_block(X, 3, [128, 128, 512], stage = 3, block = 2)
X = identity_block(X, 3, [128, 128, 512], stage = 3, block = 3)

#Final Stage
X = AveragePooling2D(pool_size = (2, 2),
                     strides = (2, 2))(X)
X = Flatten()(X)
X = Dense(features, activation='sigmoid', name='fc' + str(features), kernel_initializer = 'glorot_uniform')(X)

# Create model
model = Model(inputs = X_input, outputs = X, name='ResNet')

return model

Snapshot of the Graph

bnl-stg-1(BatchNormLayer-Stage-1)

Solution

You should not worry about it. Batch Normalization behavior changes between training and learning, so Keras adds a Boolean variable to control it (keras_learning_phase if I remember well). That is why all these layers are connected. You could expect a similar behavior with Dropout layers.