I'm trying to run through a simple linear regression example in Tensorflow, and it appears that the training algorithm is converging to a solution, but once it gets close to the solution, it starts bouncing around and eventually blows up.
I'm passing data for a y = 2x
line, so the gradient descent optimizer should be able to easily converge to a solution.
import tensorflow as tf
M = tf.Variable([0.4], dtype=tf.float32)
b = tf.Variable([-0.4], dtype=tf.float32)
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
linear_model = M * x + b
error = linear_model - y
loss = tf.square(error)
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
for i in range(100):
sess.run(optimizer, {x: i, y: 2 * i})
print(sess.run([M, b]))
Here's the result. I circled the portion where it gets close to a solution. Why does the gradient descent break once it gets close to the solution, or is there's something that I'm doing wrong?
Your code feeds the training data one at a time for only one epoch. This corresponds to stochastic gradient descent, where the loss value tends to fluctuate more frequently than batch and mini-batch gradient descent during training. Moreover, since the data is fed in an increasing order of x
, the gradient value also increases along with x
. That is why you see larger fluctuations in the later part of an epoch.