Search code examples
tensorflowpytorchreinforcement-learning

Gradient calculation in A2C


In A2C, the actor and critic algorithm, the weights are updated via equations:

delta = TD Error and

theta = theta + alpha*delta*[Grad(log(PI(a|s,theta)))] and

w = w + beta*delta*[Grad(V(s,w))]

So my question is, when using neural networks to implement this,

  1. how are the gradients calculated and

  2. am I correct that the weights are updated via the optimization fmethods in TensorFlow or PyTorch?

Thanks, Jon


Solution

  • I'm not quite clear what you mean to update with w, but I'll answer the question for theta assuming it is denoting the parameters for the actor model.

    1) Gradients can be calculated in a variety of ways, but if focussing on PyTorch, you can call .backward() on f(x)=alpha * delta * log(PI(a|s,theta), which will df/dx for every parameter x that is chained to f(x) via autograd.

    2) You are indeed correct that the weights are updated via the optimization methods in Pytorch (specifically, autograd). However, in order to complete the optimization step, you must call torch.optim.step with whatever optimizer you would like to use on the network's parameters (e.g. weights and biases).