Search code examples
cmachine-learningmathneural-networkbackpropagation

Backpropagation Through Hidden Layers in Neural Network Implementation


I've been implementing a neural network in C and have come across a problem with backpropagation through the hidden layers. My back propagation function works correctly for a single layer network (i.e., a network with no hidden layer), but it seems to fail when I try to back propagate the error through one or more hidden layers. The loss is computed correctly for the first iteration, but then increases rapidly to reach infinity.

Here's my current implementation of the backpropagation algorithm:

void backpropagation(NNetwork* network) {
Layer* outputLayer = network->layers[network->layerCount - 1];

// for each output
for(int outputIndex = 0; outputIndex < 1; outputIndex++) {
    
    // TODO: Implement this.
    double totalCost = 0.0f;

    // the output layer's step
    int layerIndex = network->layerCount - 1;
    Layer* currentLayer = network->layers[layerIndex];
    Vector* dLoss_dInputs = createVector(currentLayer->weights->rows);

    for(int outputNeuronIndex = 0; outputNeuronIndex < outputLayer->neuronCount; outputNeuronIndex++) {
        Vector* predictions = network->data->trainingOutputs->data[outputIndex];
        double prediction = predictions->elements[outputNeuronIndex];

        double target = network->data->yValues->elements[outputIndex];
        double error =  target - prediction;
        error *= error;
        error *= 0.5;
        /* TODO: ABSTRACT THIS TO SUIT MULTIPLE LOSS FUNCTIONS  
           derivative of MSE is -1 * error, as:
           derivative of 1/2 * (value)^2 = 1/2 * 2(value) => 2/2 * (value) = 1 * value
        
           Using the chain rule for differentiation (f(g(x)) = df(g(x)) * g'(x)), we then multiply this by the derivative of the inner function,
           g(x) = (desired - predicted), with respect to 'predicted', which gives g'(x) = -1
           Therefore, the derivative of the MSE with respect to 'predicted' is: f'(g(predicted)) * g'(predicted) = (desired - predicted) * -1 = predicted - desired
        */
        double dLoss_dOutput = -1 * error;

        double dOutput_dWeightedSum = currentLayer->weightedSums->elements[outputNeuronIndex] > 0 ? 1 : 0.01;
        double dLoss_dWeightedSum = dLoss_dOutput * dOutput_dWeightedSum;

        // dLoss/dInputN = Σ [(dLoss/dOutput_i) * (dOutput_i/dWeightedSum_i) * w_i->N]
        // dLoss/dInputN = Σ [dLoss_dOutput * dOutput_dWeightedSum * weight]

        for(int weightIndex = 0; weightIndex < currentLayer->weights->rows; weightIndex++) {
            
            double dWeightedSum_dWeight = matrixToVector(network->layers[layerIndex]->input)->elements[weightIndex];
            
            double dLoss_dWeight = dLoss_dWeightedSum * dWeightedSum_dWeight;
            
            currentLayer->gradients->data[weightIndex]->elements[outputNeuronIndex] = dLoss_dWeight;                
        }

        for(int prevLayerNeuronIndex = 0; prevLayerNeuronIndex < network->layers[layerIndex - 1]->neuronCount; prevLayerNeuronIndex++) {
            double dLoss_dInputN = 0.0f;
            for(int weightColumnIndex = 0; weightColumnIndex < currentLayer->weights->columns; weightColumnIndex++) {
                dLoss_dInputN += (dLoss_dOutput * dOutput_dWeightedSum) * currentLayer->weights->data[prevLayerNeuronIndex]->elements[weightColumnIndex];
            }
            printf("DLOSS_DINPUTN: %f \n", dLoss_dInputN);
            dLoss_dInputs->elements[prevLayerNeuronIndex] = dLoss_dInputN;
        }
    }

    for(layerIndex = network->layerCount - 2; layerIndex >= 0; layerIndex --) {
        currentLayer = network->layers[layerIndex];
        Vector* dLoss_dInputsHidden = createVector(currentLayer->weights->rows);
        for(int neuronIndex = 0; neuronIndex < currentLayer->neuronCount; neuronIndex++) {
            double dLoss_dOutput = dLoss_dInputs->elements[neuronIndex];

            double dOutput_dWeightedSum = currentLayer->weightedSums->elements[neuronIndex] > 0 ? 1 : 0.01;
            double dLoss_dWeightedSum = dLoss_dOutput * dOutput_dWeightedSum;


            for(int weightIndex = 0; weightIndex < currentLayer->weights->rows; weightIndex++) {
                
                double dWeightedSum_dWeight = matrixToVector(network->layers[layerIndex]->input)->elements[weightIndex];
                
                double dLoss_dWeight = dLoss_dWeightedSum * dWeightedSum_dWeight;
                
                currentLayer->gradients->data[weightIndex]->elements[neuronIndex] = dLoss_dWeight;
                
            }

            if(layerIndex == 0) {
                continue;
            }

            for(int prevLayerNeuronIndex = 0; prevLayerNeuronIndex < network->layers[layerIndex - 1]->neuronCount; prevLayerNeuronIndex++) {
                double dLoss_dInputN = 0.0f;
                for(int weightColumnIndex = 0; weightColumnIndex < currentLayer->weights->columns; weightColumnIndex++) {
                    dLoss_dInputN += (dLoss_dOutput * dOutput_dWeightedSum) * currentLayer->weights->data[prevLayerNeuronIndex]->elements[weightColumnIndex];
                }
                
                dLoss_dInputsHidden->elements[prevLayerNeuronIndex] = dLoss_dInputN;
            }
        }
        freeVector(dLoss_dInputs);
        dLoss_dInputs = dLoss_dInputsHidden;

    }
}}

My network structure is as follows:

  • Input layer with a certain number of neurons.
  • One or more hidden layers, each with a certain number of neurons.
  • Output layer with a single neuron.

Any help or guidance?


Solution

  • The problems was this line: double dLoss_dOutput = -1 * error; which should be double dLoss_dOutput = -1 * (target - prediction);

    and I had some indexing issues. If you want to see the fixed version of the code: https://github.com/MevlutArslan/neural-networks-from-scratch