In A2C, the actor and critic algorithm, the weights are updated via equations:
delta = TD Error and
theta = theta + alpha*delta*[Grad(log(PI(a|s,theta)))] and
w = w + beta*delta*[Grad(V(s,w))]
So my question is, when using neural networks to implement this,
how are the gradients calculated and
am I correct that the weights are updated via the optimization fmethods in TensorFlow or PyTorch?
Thanks, Jon
I'm not quite clear what you mean to update with w
, but I'll answer the question for theta
assuming it is denoting the parameters for the actor model.
1) Gradients can be calculated in a variety of ways, but if focussing on PyTorch, you can call .backward()
on f(x)=alpha
* delta * log(PI(a|s,theta), which will df/dx for every parameter x that is chained to f(x) via autograd.
2) You are indeed correct that the weights are updated via the optimization methods in Pytorch (specifically, autograd). However, in order to complete the optimization step, you must call torch.optim.step
with whatever optimizer you would like to use on the network's parameters (e.g. weights and biases).