Search code examples
c++machine-learningneural-networklogistic-regressionbackpropagation

Convolutional network filter always negative


I asked a question about a network which I've been building last week, and I iterated on the suggestions which lead me to finding a few problems. I've come back to this project and fixed up all the issues and learnt a lot more about CNNs in the process. Now I'm stuck on an issue were all of my weights move to massively negative values, which coupled with the RELU ends in the output image always being completely black (making it impossible for the classifier to do it's job).

On two labeled images:

enter image description here enter image description here

These are passed into a two layer network, one classifier (which gets 100% on its own) and a one filter 3*3 convolutional layer.

On the first iteration the output from the conv layer looks like (images in same order as above):

enter image description here enter image description here

The filter is 3*3*3, due to the images being RGB. The weights are all random numbers between 0.0f-1.0f. On the next iteration the images are completely black, printing the filters shows that they are now in range of -49678.5f (the highest I can see) and -61932.3f.

This issue in turn is due to the gradients being passed back from the Logistic Regression/Linear layer being crazy high for the cross (label 0, prediction 0). For the circle (label 1, prediction 0) the values are between roughly -12 and -5, but for the cross they are all in the positive high 1000 to high 2000 range.

The code which sends these back looks something like (some parts omitted):

void LinearClassifier::Train(float * x,float output, float y)
{
    float h = output - y;
    float average = 0.0f;
    for (int i =1; i < m_NumberOfWeights; ++i)
    {
        float error = h*x[i-1];
        m_pGradients[i-1] = error;
        average += error;
    }

    average /= static_cast<float>(m_NumberOfWeights-1);

    for (int theta = 1; theta < m_NumberOfWeights; ++theta)
    {
        m_pWeights[theta] = m_pWeights[theta] - learningRate*m_pGradients[theta-1];
    }

    // Bias
    m_pWeights[0] -= learningRate*average;
}

This is passed back to the single convolution layer:

// This code is in three nested for loops (for layer,for outWidth, for outHeight)
float gradient = 0.0f;
// ReLu Derivative
if ( m_pOutputBuffer[outputIndex] > 0.0f) 
{
    gradient = outputGradients[outputIndex];
}

for (int z = 0; z < m_InputDepth; ++z)
{
    for ( int u = 0; u < m_FilterSize; ++u)
    {
        for ( int v = 0; v < m_FilterSize; ++v)
        {
            int x = outX + u - 1;
            int y = outY + v - 1;

            int inputIndex = x + y*m_OutputWidth + z*m_OutputWidth*m_OutputHeight;

            int kernelIndex = u + v*m_FilterSize + z*m_FilterSize*m_FilterSize;

            m_pGradients[inputIndex] += m_Filters[layer][kernelIndex]*gradient;
            m_GradientSum[layer][kernelIndex] += input[inputIndex]*gradient;
        }
    }
}

This code is iterated over by passing each image in a one at a time fashion. The gradients are obviously going in the right direction but how do I stop the huge gradients from throwing the prediction function?


Solution

  • RELU activations are notorious for doing this. You usually have to use a low learning rate. The reasoning behind this is that when the RELU returns positive numbers it can continue to learn freely, but if a unit gets in a position where the signal coming into it is always negative it can become a "dead" neuron and never activate again.

    Also initializing your weights is more delicate with RELU. It appears that you are initializing to range 0-1 which creates a huge bias. Two tips here - Use a range centered around 0, and a range that is much smaller. A normal distribution with mean 0 and std 0.02 usually works well.