python python-3.x linear-regression nan gradient-descent

How do I fix a function that returns NAN?

I wanted to try to implement gradient descent by myself and I wrote this:

# Creating random sample dataset
import random as rnd 
dataset = []
for i in range(0, 500):
    d_dataset = [i, rnd.randint((i-4), (i+4))]
    dataset.append(d_dataset)

def gradient_descent(t0, t1, lrate, ds):
    length = len(ds)
    c0, c1 = 0, 0
    for element in ds:
        elx = element[0]
        ely = element[1]
        c0 += ((t0 + (t1*elx) - ely)) 
        c1 += ((t0 + (t1*elx) - ely)*elx) 
    t0 -= (lrate * c0 / length)
    t1 -= (lrate * c1 / length)
    return t0, t1

def train(t0, t1, lrate, trainlimit, trainingset):
    k = 0
    while k < trainlimit:
        new_t0, new_t1 = gradient_descent(t0, t1, lrate, trainingset)
        t0, t1 = new_t0, new_t1
        k += 1
    return t0, t1

print(gradient_descent(20, 1, 1, dataset))
print(train(0, 0, 1, 10000, dataset))

Whenever I run this, I get a somewhat normal output from the gradient_descent() but I get (nan, nan) from the train() function. I tried running train with the input (0, 0, 1, 10, dataset) and I get this value (-4.705770241957691e+46, -1.5670167612541557e+49), which seems very wrong.

Please tell me what I'm doing wrong and how to fix this error. Sorry if this has been asked before but I couldn't find any answers on how to fix nan error.

Solution

When calling print(train(0, 0, 1, 10000, dataset)), the values returned by gradient_descent(t0, t1, lrate, trainingset) are increasing in every iteration of the while-loop. When they become larger than the maximum value allowed for float, they will automatically be converted to float('inf'), a float representing infinity. Check this maximum value on your system with sys.float_info.max:

import sys
print(sys.float_info.max)

However, your function gradient_descent() can't handle infinite values, which you can verify with the following call to your function:

gradient_descent(float('inf'), float('inf'), 1, dataset)

The problem here are the following two lines in gradient_descent(), which are not well defined for t0 and t1 being infinite:

c0 += ((t0 + (t1*elx) - ely)) 
c1 += ((t0 + (t1*elx) - ely)*elx)