Search code examples
tensorflowmachine-learningreinforcement-learningopenai-gym

I need help understanding reinforcement learning code


I've been trying to solve the OpenAI MountainCarContinuous-v0 environment for a while but I have been stuck.

After spending weeks on my own trying to solve it, I am now just trying to understand someone else's code. Here is the link the person used to solve the enviroment. Specifically, I need help with the loss function.

In the GitHub code is written as

self.norm_dist = tf.contrib.distributions.Normal(self.mu, self.sigma)
self.loss = -tf.log(self.norm_dist.prob(self.action_train) + 1e-5) * self.advantage_train - self.lamb * self.norm_dist.entropy()

What is this loss function doing? If you could describe it in simple terms that would help me so much.


Solution

  • In the first step, a normal-ditribution is defined with mean and variance. In the next step loss function is defined something like -A*log(p(a)) + \lambda * entropy, where A is advantage, p(a) is probability of action which is sampled from the normal-dist and finally entropy of distribution is being added in loss function.