Search code examples
pythonmachine-learninggradient-descent

SGD - loss starts increasing after some iterations


I'm trying to implement a stochastic gradient descent with two constraints, therefore cannot use scikit-learn. Unfortunately, I'm already struggling with the regular SGD without the two constraints. The loss (squared loss) on the training set drops for some iterations, but starts to increase after some time as shown in the pictures. These are the functions I use:

def loss_prime_simple(w,node,feature,data):
   x = data[3]
   y = data[2]
   x_f = x[node][feature]
   y_node = y[node]
   ret = (y_node - w[feature] * x_f) * (-x_f)
   return ret

def update_weights(w,data,predecs,children,node, learning_rate):
   len_features = len(data[3][0])
   w_new = np.zeros(len_features)
   for feature_ in range(len_features):
      w_new[feature_] = loss_prime_simple(w,node,feature_,data)
   return w - learning_rate * w_new

def loss_simple(w,data):
   y_p = data[2]
   x = data[3]
   return ((y_p - np.dot(w,np.array(x).T)) ** 2).sum()

This shows the loss on the training set with two different learning rates (0.001, 0.0001) http://postimg.org/image/43nbmh8x5/

Can anyone find a mistake or has an advice how to debug this? Thanks

EDIT:

As lejlot pointed out, it would be good to have the data. Here is the data i'm using for x (single sample): http://textuploader.com/5x0f1

y=2

This gives a loss of this: http://postimg.org/image/o9d97kt9v/

The updated code:

def loss_prime_simple(w,node,feature,data):
   x = data[3]
   y = data[2]
   x_f = x[node][feature]
   y_node = y[node]
   return -(y_node - w[feature] * x_f) * x_f

def update_weights(w,data,predecs,children,node, learning_rate):
   len_features = len(data[3][0])
   w_new = np.zeros(len_features)
   for feature_ in range(len_features):
      w_new[feature_] = loss_prime_simple(w,node,feature_,data)
   return w - learning_rate * w_new

def loss_simple2(w,data):
   y_p = data[2]
   x = data[3]
   return ((y_p - np.dot(w,np.array(x).T)) ** 2).sum()

import numpy as np
X = [#put array from http://textuploader.com/5x0f1 here]
y = [2]

data = None, None, y, X

w = np.random.rand(4096)

a = [ loss_simple2(w, data) ]

for _ in range(200):
    for j in range(X.shape[0]):
        w = update_weights(w,data,None,None,j, 0.0001)
        a.append( loss_simple2(w, data) )

from matplotlib import pyplot as plt
plt.figure()
plt.plot(a)
plt.show()

Solution

  • The problem was that I updated the weights with enter image description here instead of enter image description here

    So this works:

    def update_weights(w,x,y, learning_rate):
        inner_product = 0.0    
        for f_ in range(len(x)):
            inner_product += (w[f_] * x[f_])
        dloss = inner_product - y
        for f_ in range(len(x)):
            w[f_] += (learning_rate * (-x[f_] * dloss))
        return w