Why does my gradient descent optimizer blow up after getting close to a solution?

I'm trying to run through a simple linear regression example in Tensorflow, and it appears that the training algorithm is converging to a solution, but once it gets close to the solution, it starts bouncing around and eventually blows up.

I'm passing data for a y = 2x line, so the gradient descent optimizer should be able to easily converge to a solution.

import tensorflow as tf

M = tf.Variable([0.4], dtype=tf.float32)
b = tf.Variable([-0.4], dtype=tf.float32)

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

linear_model = M * x + b

error = linear_model - y
loss = tf.square(error)

optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    for i in range(100):
        sess.run(optimizer, {x: i, y: 2 * i})
        print(sess.run([M, b]))

Here's the result. I circled the portion where it gets close to a solution. Why does the gradient descent break once it gets close to the solution, or is there's something that I'm doing wrong?

Solution

Your code feeds the training data one at a time for only one epoch. This corresponds to stochastic gradient descent, where the loss value tends to fluctuate more frequently than batch and mini-batch gradient descent during training. Moreover, since the data is fed in an increasing order of x, the gradient value also increases along with x. That is why you see larger fluctuations in the later part of an epoch.