artificial-intelligence reinforcement-learning

Critic Loss for RL Agent

While I was implementing agents for various problems...I have seen that my actor loss is reducing as expected. But my critic loss kept increases even though the policy learned is very. This happens for DDPG , PPO etc.

Any thoughts why my critic loss is increasing.

I tried playing with hyper parameters, it actually makes my policy worse.

Solution

In Reinforcement Learning, you really shouldn't typically be paying attention to the precise values of your loss values. They are not informative in the same sense that they would be in, for example, supervised learning. The loss values should only be used to compute the correct updates for your RL approach, but they do not actually give you any real indication of how well or poorly you are doing.

This is because in RL, your learning targets are often non-stationary; they are often a function of the policy that you are modifying (hopefully improving!). It's very well possible that, as the performance of your RL agent improves, your loss actually increases. Due to its improvement, it may discover new parts of its search space which lead to new target values that your agent was previously completely oblivious to.

Your only really reliable metric for how well your agent is doing is the returns it collects in evaluation runs.