tensorflow linear-regression gradient-descent non-linear-regression

Gradient Descent isn't working

I am learning tensorflow from a Stanford course named, "TensorFlow for Deep Learning Research". I have taken the code from the following address. While exploring tensorflow I changed

Y_predicted = X * w + b

Y_predicted = X * X * w + X * u + b

to check that non-linear curve fitted better. I have added

Y_predicted = X * X * w + X * u + b

according to author's suggestion of this note(page 3). But after adding this line and run the similar code again, every error value seems to get nan. Can anybody point out the problem and give a solution.

""" Simple linear regression example in TensorFlow
This program tries to predict the number of thefts from 
the number of fire in the city of Chicago
Author: Chip Huyen
Prepared for the class CS 20SI: "TensorFlow for Deep Learning Research"
cs20si.stanford.edu
"""
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import xlrd

#import utils

DATA_FILE = "slr05.xls"

# Step 1: read in data from the .xls file
book = xlrd.open_workbook(DATA_FILE, encoding_override="utf-8")
sheet = book.sheet_by_index(0)
data = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1

# Step 2: create placeholders for input X (number of fire) and label Y (number of theft)
X = tf.placeholder(tf.float32, name='X')
Y = tf.placeholder(tf.float32, name='Y')

# Step 3: create weight and bias, initialized to 0
w = tf.Variable(0.0, name='weights')
u = tf.Variable(0.0, name='weights2')
b = tf.Variable(0.0, name='bias')

# Step 4: build model to predict Y
#Y_predicted = X * w + b 
Y_predicted = X *  X *  w +  X *  u +  b

# Step 5: use the square error as the loss function
loss = tf.square(Y - Y_predicted, name='loss')
# loss = utils.huber_loss(Y, Y_predicted)

# Step 6: using gradient descent with learning rate of 0.01 to minimize loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)

with tf.Session() as sess:
    # Step 7: initialize the necessary variables, in this case, w and b
    sess.run(tf.global_variables_initializer()) 

    writer = tf.summary.FileWriter('./graphs/linear_reg', sess.graph)

    # Step 8: train the model
    for i in range(100): # train the model 100 epochs
        total_loss = 0
        for x, y in data:
            # Session runs train_op and fetch values of loss
            _, l = sess.run([optimizer, loss], feed_dict={X: x, Y:y}) 
            total_loss += l
        print('Epoch {0}: {1}'.format(i, total_loss/n_samples))

    # close the writer when you're done using it
    writer.close() 

    # Step 9: output the values of w and b
    w, u , b = sess.run([w, u , b]) 

# plot the results
X, Y = data.T[0], data.T[1]
plt.plot(X, Y, 'bo', label='Real data')
plt.plot(X, X * x * w + X * u + b, 'r', label='Predicted data')
plt.legend()
plt.show()

Solution

Oops! Your learning rate seems too big, try with something like learning_rate=0.0000001 and it will converge. This is a common problem, especially when you introduce interaction features, as in your case: you should keep in mind that the range of x**2 will be greater (if the original was [-100, 100] the quadratic will be [-10000, 10000]), hence the learning rate that worked well for the linear model may be too big for the polynomial one. Check out about feature scaling. This picture gives a more intuitive explanation:

Hope it helps!
Andres