Search code examples
tensorflownon-linear-regression

Loss not converging in Polynomial regression in Tensorflow


import numpy as np 
import tensorflow as tf


#input data:
x_input=np.linspace(0,10,1000)
y_input=x_input+np.power(x_input,2)

#model parameters
W = tf.Variable(tf.random_normal([2,1]), name='weight')
#bias
b = tf.Variable(tf.random_normal([1]), name='bias')

#placeholders
#X=tf.placeholder(tf.float32,shape=(None,2))
X=tf.placeholder(tf.float32,shape=[None,2])
Y=tf.placeholder(tf.float32)
x_modified=np.zeros([1000,2])

x_modified[:,0]=x_input
x_modified[:,1]=np.power(x_input,2)
#model
#x_new=tf.constant([x_input,np.power(x_input,2)])
Y_pred=tf.add(tf.matmul(X,W),b)

#algortihm
loss = tf.reduce_mean(tf.square(Y_pred -Y ))
#training algorithm
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
#initializing the variables
init = tf.initialize_all_variables()

#starting the session session 
sess = tf.Session()
sess.run(init)

epoch=100

for step in xrange(epoch): 
   # temp=x_input.reshape((1000,1)) 
    #y_input=temp

     _, c=sess.run([optimizer, loss], feed_dict={X: x_modified, Y: y_input})
     if step%50==0 :
       print c

print "Model paramters:"       
print  sess.run(W)
print "bias:%f" %sess.run(b)

I'm trying to implement Polynomial regression(quadratic) in Tensorflow. The loss isn't converging. Could anyone please help me out with this. The similar logic is working for linear regression though!


Solution

  • First there is a problem in your shapes, for Y_pred and Y:

    • Y has unknown shape, and is fed with an array of shape (1000,)
    • Y_pred has shape (1000, 1)
    • Y - Y_pred will then have shape (1000, 1000)

    This small code will prove my point:

    a = tf.zeros([1000])  # shape (1000,)
    b = tf.zeros([1000, 1])  # shape (1000, 1)
    print (a-b).get_shape()  # prints (1000, 1000)
    

    You should use consistent types:

    y_input = y_input.reshape((1000, 1))
    
    Y = tf.placeholder(tf.float32, shape=[None, 1])
    

    Anyway, the loss is exploding because you have very high values (input between 0 and 100, you should normalize it) and thus very high loss (around 2000 at the beginning of training).
    The gradient is very high and the parameters explode, and the loss gets to infinite.

    The quickest fix is to lower your learning rate (1e-5 converges for me, albeit very slowly at the end). You can make it higher after the loss converges to around 1.