Search code examples
pythonconv-neural-networkbackpropagation

How to combat huge numbers produced by relu-oriented CNN


I have a CNN with a structure loosely close to AlexNet, see below:

Convolutional Neural Network structure:
100x100x3      Input image
25x25x12       Convolutional layer: 4x4x12, stride = 4, padding = 0
12x12x12       Max pooling layer: 3x3, stride = 2
12x12x24       Convolutional layer: 5x5x24, stride = 1, padding = 2
5x5x24         Max pooling layer: 4x4, stride = 2
300x1x1        Flatten layer: 600 -> 300
300x1x1        Fully connected layer: 300
3x1x1          Fully connected layer: 3

Obviously, with only max pooling and convolutional layers, the numbers will approach 0 and infinity, depending of how negative the weights are. I was wondering of any approaches to combat this, seeing as I would like to avoid large numbers.

One problem that arrises from this is if you use sigmoid in the final layers. Seeing as the derivative of sigmoid is s(x)*(1-s(x)). Having larger numbers will inevitably make the value of sigmoid 1, and so you'll notice on back prop, you have 1*(1-1), which obviously doesn't go down too well.

So I would like to know of any ways to try and keep the numbers low.

Tagged with python because that's what I implemented this in. I used my own code.


Solution

  • I asked this question on AI stack exchange (which it is better suited for) and through implementing the correct weight initialisation, numbers will neither explode or vanish on a forward or backward pass. See here: https://ai.stackexchange.com/questions/13106/how-are-exploding-numbers-in-a-forward-pass-of-a-cnn-combated