Search code examples
tensorflowoptimizationgradientautodiff

Does TensorFlow gradient compute derivative of functions with unknown dependency on decision variable


I appreciate if you can answer my questions or provide me with useful resources.

Currently, I am working on a problem that I need to do alternating optimization. So, consider we have two decision variables x and y. In the first step I take the derivative of loss function wrt. x (for fixed y) and update x. On the second step, I need to take the derivative wrt. y. The issue is x is dependent on y implicitly and finding the closed form of cost function in a way to show the dependency of x on y is not feasible, so the gradients of cost function wrt. y are unknown.

1) My first question is whether "autodiff" method in reverse mode used in TensorFlow works for these problems where we do not have an explicit form of cost function wrt to one variable and we need the derivatives? Actually, the value of cost function is known but the dependency on decision variable is unknown via math.

2) From a general view, if I define a node as a "tf.Variable" and have an arbitrary intractable function(intractable via computation by hand) of that variable that evolves through code execution, is it possible to calculate the gradients via "tf.gradients"? If yes, how can I make sure that it is implemented correctly? Can I check it using TensorBoard?


My model is too complicated but a simplified form can be considered in this way: suppose the loss function for my model is L(x). I can code L(x) as a function of "x" during the construction phase in tensorflow. However, I have also another variable "k" that is initialized to zero. The dependency of L(x) on "k" shapes as the code runs so my loss function is L(x,k), actually. And more importantly, "x" is a function of "k" implicitly. (all the optimization is done using GradientDescent). The problem is I do not have L(x,k) as a closed form function but I have the value of L(x,k) at each step. I can use "numerical" methods like FDSA/SPSA but they are not exact. I just need to make sure as you said there is a path between "k" and L(x,k)but I do not know how!


Solution

  • TensorFlow gradients only work when the graph connecting the x and the y when you're computing dy/dx has at least one path which contains only differentiable operations. In general if tf gives you a gradient it is correct (otherwise file a bug, but gradient bugs are rare, since the gradient for all differentiable ops is well tested and the chain rule is fairly easy to apply).

    Can you be a little more specific about what your model looks like? You might also want to use eager execution if your forward complication is too weird to express as a fixed dataflow graph.