Search code examples
machine-learningreinforcement-learning

Proximal Policy Optimization Algorithms paper - definition of "KL" operation?


In the original paper on Proximal Policy Optimization Algorithms

https://arxiv.org/pdf/1707.06347.pdf

in equation (4) the authors use an operation denoted by KL[]. Unfortunately, they never give a definition for it.

My question:

What does the KL[] operation stand for?


Solution

  • Maybe it's KL divergence?

    KL divergence is used to compare differences between two probability distribution.