Search code examples
pythonoptimizationneural-networkgradient-descentcalculus

Cost value doesn't decrease when using gradient descent


I have data pairs (x,y) which are created by a cubic function

y = g(x) = ax^3 − bx^2 − cx + d 

plus some random noise. Now, I want to fit a model (parameters a,b,c,d) to this data using gradient descent.

My implementation:

param={}
param["a"]=0.02
param["b"]=0.001
param["c"]=0.002
param["d"]=-0.04

def model(param,x,y,derivative=False):
    x2=np.power(x,2)
    x3=np.power(x,3)
    y_hat = param["a"]*x3+param["b"]*x2+param["c"]*x+param["d"]
    if derivative==False:
        return y_hat
    derv={} #of Cost function w.r.t parameters
    m = len(y_hat)
    derv["a"]=(2/m)*np.sum((y_hat-y)*x3)
    derv["b"]=(2/m)*np.sum((y_hat-y)*x2)
    derv["c"]=(2/m)*np.sum((y_hat-y)*x)
    derv["d"]=(2/m)*np.sum((y_hat-y))
    return derv

def cost(y_hat,y): 
    assert(len(y)==len(y_hat))
    return (np.sum(np.power(y_hat-y,2)))/len(y)

def optimizer(param,x,y,lr=0.01,epochs = 100):
    for i in range(epochs):
        y_hat = model(param,x,y)
        derv = model(param,x,y,derivative=True)
        param["a"]=param["a"]-lr*derv["a"]
        param["b"]=param["b"]-lr*derv["b"]
        param["c"]=param["c"]-lr*derv["c"]
        param["d"]=param["d"]-lr*derv["d"]
        if i%10==0:
            #print (y,y_hat)
            #print(param,derv)
            print(cost(y_hat,y))

X = np.array(x)
Y = np.array(y)
optimizer(param,X,Y,0.01,100)

When run, the cost seems to be increasing:

36.140028646153525
181.88127675295928
2045.7925570171055
24964.787906199843
306448.81623701524
3763271.7837247783
46215271.5069297
567552820.2134454
6969909237.010273
85594914704.25394

Did I compute the gradients wrong? I don't know why the cost is exploding.

Here is the data: https://pastebin.com/raw/1VqKazUV.


Solution

  • If I run your code with e.g. lr=1e-4, the cost decreases.

    Check your gradients (just print the result of model(..., True)), you will see that they are quite large. As your learning rate is also not too small, you are likely oscillating away from the minimum (see any ML textbook for example plots of this, you should also be able to see this if you just print your parameters after every iteration).