Search code examples
tensorflowneural-networkdeep-learningbackpropagation

Why `tf.train.Optimizer().compute_gradients(loss)` also returns variables not in the subgraph of `loss`?


I'm manually collecting the gradients statistics of the Multi-Tasking model, the graph of which looks schematically like this:

input -> [body_var1 ... body_varN] --> [task1_var1 ... task1_varM] <-- loss_1
                                   \-> [task2_var1 ... task2_varM] <-- loss_2

I'm defining a separate optimizer for each loss as follows (the actual code is much complicated, the following is simplified for this question):

# for simplicity, just demonstrate the case with the 1st task
task_index = 1

# here we define the optimizer (create an instance in graph)
loss = losses[task_index]
optimizer = tf.train.GradientDescentOptimizer()
grads_and_vars = optimizer.compute_gradients(loss)

# now let's see what it returns
for g, v in grads_and_vars:
    print('  grad:', g, ', var:', v)

So, the code above clearly creates a separated optimizer only for the branch of task 1, then we create the gradient computation ops with optimizer.compute_gradients(loss) and print the vars to which we apply the gradients to.

Expected results:

grad: body_var1_grad, var: body_var1    # \
...                                     # --> body vars and gradients
grad: body_varN_grad, var: body_varN    # /
grad: task1_var1_grad, var: task1_var1  # \
...                                     # --> task 1 vars and gradients
grad: task1_var1_grad, var: task1_var1  # /

So I'm expecting that the optimizer only contains gradient computing ops for the branch it was applied on (i.e. the branch for 1st task)

Actual results

grad: body_var1_grad, var: body_var1    # \
...                                     # --> body vars and gradients
grad: body_varN_grad, var: body_varN    # /
grad: task1_var1_grad, var: task1_var1  # \
...                                     # --> task 1 vars and gradients
grad: task1_var1_grad, var: task1_var1  # /
grad: None, var: task2_var1             # \
...                                     # --> task 2 vars, with None gradients
grad: None, var: task2_var1             # /

So it looks like optimizer.compute_gradients(loss) captures not only the sub-graph that outputs to loss (which can be extracted using tf.graph_util.extract_sub_graph), but also all trainable variables that are connected to loss without creating a gradient variable for them (so the returned gradient variables are None).

Question: is such behavior normal?


Solution

  • Yes, it is, because compute_gradients() computes gradients of loss with respect to a list of tf.Variable objects which is passed to the var_list parameter. If var_list is not provided, the function calculates gradients with respect to all variables from GraphKeys.TRAINABLE_VARIABLES collection. Also, if loss does not depend on certain variables, the gradients of loss with respect to those variables are not defined, i.e. None is returned. Based on the code you provided, this seems to be the case.

    If you want the optimizer to calculate gradients with respect to certain variables only, you should make a list of such variables and pass it to the var_list parameter of compute_gradients().