Search code examples
tensorflowtraining-datagradient-descent

How does GradientDescentOptimizer.minimize() work?


In regard to TensorFlow, I am confused with how GradientDescentOptimizer.minimize() actually works. To be more specific through the code below, how does calling minimize(error) modify the m, b so that when I just call sess.run([m, b]), they return the modified m, bvalue? I think it's hard to find any connections between minimize() and Variables m and b like the result at the end of this following code:

#Actaul Data
x_data = np.linspace(0, 10, 10) + np.random.uniform(-1.5, 1.5, 10)
y_label = np.linspace(0, 10, 10) + np.random.uniform(-1.5, 1.5, 10)

#Random Variables --> These variables will be be modified by minimize()
m = tf.Variable(0.44)
b = tf.Variable(0.87)

error = 0

for x, y in zip(x_data, y_label):
    error += (y - (m*x + b)) ** 2

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train = optimizer.minimize(error)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    training_steps = 100

    for i in range(training_steps):
        sess.run(train)

    final_slope, final_intercept = sess.run([m, b])

    print(final_slope, final_intercept) # 0.7535087, 0.83729243

Solution

  • The link between your optimizer and the trainable variables like m and b is this.

    Trainable variables

    You can set this parameter to False to exclude any variable from training. In your code by default trainable is True. It will pick up any other variable and try to optimize it too if trainable is not False.

    m = tf.Variable(0.44,trainable=False)
    b = tf.Variable(0.87)
    

    The output in this case is

    0.44 2.134535

    Explicitly passing var_list

    It is possible to collect all trainable variable using code.

    variables = tf.trainable_variables()
    allvariables = [var for var in variables]
    
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    train = optimizer.minimize(error,var_list=variables)
    

    So if it's not mx+b but other expressions we can optimize whatever we want.

    There are probably other advanced ways to control this.

    with tf.variable_scope('discriminator'):
        c = tf.Variable(1.0)
    
    variables = tf.trainable_variables()
    allvariables = [var for var in variables if var.name.startswith("discriminator")]
    

    This includes just c.