Search code examples
pythonneural-networkgradient-descent

What is the purpose of adding an expcost to stochastic gradient descent


i am trying to implement an SGD based upon the scaffold given by stanford in their first assignment to cs224n. the implementation is in python. the scaffold is as follows:

def load_saved_params():
'''A helper function that loads previously saved parameters and resets
iteration start.'''
return st, params, state #st = starting iteration

def save_params(iter, params):
'''saves the parameters'''

and now the main function (i have followed the statements of interest with multiple hash symbols)

def sgd(f, x0, step, iterations, postprocessing=None, useSaved=False,
    PRINT_EVERY=10):
""" Stochastic Gradient Descent

Implement the stochastic gradient descent method in this function.

Arguments:
f -- the function to optimize, it should take a single
     argument and yield two outputs, a cost and the gradient
     with respect to the arguments
x0 -- the initial point to start SGD from
step -- the step size for SGD
iterations -- total iterations to run SGD for
postprocessing -- postprocessing function for the parameters
                  if necessary. In the case of word2vec we will need to
                  normalize the word vectors to have unit length.
PRINT_EVERY -- specifies how many iterations to output loss

Return:
x -- the parameter value after SGD finishes
"""

# Anneal learning rate every several iterations
ANNEAL_EVERY = 20000

if useSaved:
    start_iter, oldx, state = load_saved_params()
    if start_iter > 0:
        x0 = oldx
        step *= 0.5 ** (start_iter / ANNEAL_EVERY)

    if state:
        random.setstate(state)
else:
    start_iter = 0

x = x0

if not postprocessing:
    postprocessing = lambda x: x

expcost = None ######################################################

for iter in xrange(start_iter + 1, iterations + 1):
    # Don't forget to apply the postprocessing after every iteration!
    # You might want to print the progress every few iterations.

    cost = None

    ### END YOUR CODE

    if iter % PRINT_EVERY == 0:
        if not expcost:
            expcost = cost
        else:
            expcost = .95 * expcost + .05 * cost ########################
        print "iter %d: %f" % (iter, expcost)

    if iter % SAVE_PARAMS_EVERY == 0 and useSaved:
        save_params(iter, x)

    if iter % ANNEAL_EVERY == 0:
        step *= 0.5

return x

for my purpose i have no use of expcost. but what is the purpose of the expcost in the code. under what circumstances can it possibly be used? why is it used in modifying the cost calculated by the cost function?


Solution

  • If you notice, expcost is only used for printing out the cost. It's just a way of smoothing the cost function since it can jump noticeably from batch to batch, despite the model's improvements