tensorflow pytorch reinforcement-learning

Gradient calculation in A2C

In A2C, the actor and critic algorithm, the weights are updated via equations:

delta = TD Error and

theta = theta + alpha*delta*[Grad(log(PI(a|s,theta)))] and

w = w + beta*delta*[Grad(V(s,w))]

So my question is, when using neural networks to implement this,

how are the gradients calculated and
am I correct that the weights are updated via the optimization fmethods in TensorFlow or PyTorch?

Thanks, Jon

Solution

I'm not quite clear what you mean to update with w, but I'll answer the question for theta assuming it is denoting the parameters for the actor model.

1) Gradients can be calculated in a variety of ways, but if focussing on PyTorch, you can call .backward() on f(x)=alpha * delta * log(PI(a|s,theta), which will df/dx for every parameter x that is chained to f(x) via autograd.

2) You are indeed correct that the weights are updated via the optimization methods in Pytorch (specifically, autograd). However, in order to complete the optimization step, you must call torch.optim.step with whatever optimizer you would like to use on the network's parameters (e.g. weights and biases).