For startes: this question does not ask for help regarding reinforcement learning (RL), RL is only used as an example.
The Keras documentation contains an example actor-critic reinforcement learning implementation using Gradient Tape. Basically, they've created a model with two separate outputs: one for the actor (n
actions) and one for the critic (1
reward). The following lines describe the backpropagation process (found somewhere in the code example):
# Backpropagation
loss_value = sum(actor_losses) + sum(critic_losses)
grads = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
Despite the fact that the actor and critic losses are calculated differently, they sum up those two losses to obtain the final loss value used for calculating the gradients.
When looking at this code example, one question came to my mind: Is there a way to calculate the gradients of the output layer with respect to the corresponding losses, i.e. calculate the gradients of the first n
output nodes based on the actor loss and the gradient of the last output node using the critic loss? For my understanding, this would be much more convenient than adding both losses (different!) and updating the gradients based on this cumulative approach. Do you agree?
Well, after some research I found the answer myself: It is possible to extract the trainable variables of a given layer based on the layer name. Then we can apply tape.gradient
and optimizer.apply_gradients
to the extracted set of trainable variables. My current solution is pretty slow, but it works. I just need to figure out how to improve its runtime.