In the original paper on Proximal Policy Optimization Algorithms
in equation (4) the authors use an operation denoted by KL[]
. Unfortunately, they never give a definition for it.
My question:
What does the
KL[]
operation stand for?
Maybe it's KL divergence?
KL divergence is used to compare differences between two probability distribution.