neural-network backpropagation reinforcement-learning

Training Neural Networks with big linear output

I am programming a Feed Forward Neural Network which I want to use in combination with Reinforcement Learning. I have one hidden layer with tanh as activation function and a linear output layer.

I have three inputs which are normalized to [0,1]. There also are three output nodes, which gives the reward received from the environment. The rewards are always negative. At the beginning, when the chosen actions lead to bad decisions, it can be like -50000, with good decisions it can be -5.

I am struggling with the implementation of the back propagation. Since the rewards are so big, the error values are huge, which creates huge weights. After a few training rounds, the weights to the hidden layer are so big, my nodes in the hidden layer are only creating the values -1 or 1.

This is my code:

public void trainState(double[] observation, double[] hiddenEnergy, double oldVal, int chosenAction, double target, double lambda)
{
    double deltaK = (target - oldVal) * lambda;
    double deltaJ;

    for (int j = 0; j < _hiddenSize; j++)
    {
        deltaJ = (1- hiddenEnergy[j] * hiddenEnergy[j]) * deltaK * _toOutputWeights[j][chosenAction];

        for (int i = 0; i < _inputSize; i++)
        {
            _toHiddenWeights[i][j] += deltaJ * observation[i];
        }
    }

    for (int i = 0; i < _hiddenSize; i++)
    {
        _toOutputWeights[i][chosenAction] += deltaK * hiddenEnergy[i];
    }
}

Solution

You said: "Since the rewards are so big, the error values are huge, which creates huge weights."

Emphasis mine

I suggest that use log of rewards. This is a standard trick in math to control huge values. This way your errors and weights would be more manageable.

log_of_rewards = log(rewards);
// use this value in calculations