conv-neural-network resnet batch-normalization deep-residual-networks

adding batch norm to a non-batch norm layer

I'm implementing a modified ResNet architecture. In the Basic Block of ResNet, I've used the Conv layer in a shortcut connection. So my main path consists of two Conv layers, each followed by Batch Norm layer followed by ReLU layer while in shortcut connection there is only Conv layer without Batch Norm layer. And then finally adding this shortcut connection to the main path. The figure shown below summarises the explanation above.

Although it's quite well known that the Batch Norm layer should be added to the Batch Norm layer, but here raw conv output is added to batch norm layer.

But surprisingly my model is giving a better performance which this presented architecture and when I'm adding batch norm layer in shortcut connection then my model's performance is decreasing drastically and not even converging to as was the case with former setting even after exhaustive hyperparameter tuning.

So my question is should I follow the batch norm layer added in the shortcut connection even though the performance is bad because it respects the literature or should I go with the one performing better in this case without the batch norm layer in the shortcut connection. Also, if I want to publish this work, the reviewer is definitely going to raise this issue, what sort of explanation should I add before hand to make things more clear

Solution

adding a batch normed output with another output which has not been passed via batch norm layer is not the right way to do things. It will defeat the whole purpose of using batch norm layer in the network. So yes, you need to add batch norm layer in the residual connection also