i am trying to implement an SGD based upon the scaffold given by stanford in their first assignment to cs224n. the implementation is in python. the scaffold is as follows:
def load_saved_params():
'''A helper function that loads previously saved parameters and resets
iteration start.'''
return st, params, state #st = starting iteration
def save_params(iter, params):
'''saves the parameters'''
and now the main function (i have followed the statements of interest with multiple hash symbols)
def sgd(f, x0, step, iterations, postprocessing=None, useSaved=False,
PRINT_EVERY=10):
""" Stochastic Gradient Descent
Implement the stochastic gradient descent method in this function.
Arguments:
f -- the function to optimize, it should take a single
argument and yield two outputs, a cost and the gradient
with respect to the arguments
x0 -- the initial point to start SGD from
step -- the step size for SGD
iterations -- total iterations to run SGD for
postprocessing -- postprocessing function for the parameters
if necessary. In the case of word2vec we will need to
normalize the word vectors to have unit length.
PRINT_EVERY -- specifies how many iterations to output loss
Return:
x -- the parameter value after SGD finishes
"""
# Anneal learning rate every several iterations
ANNEAL_EVERY = 20000
if useSaved:
start_iter, oldx, state = load_saved_params()
if start_iter > 0:
x0 = oldx
step *= 0.5 ** (start_iter / ANNEAL_EVERY)
if state:
random.setstate(state)
else:
start_iter = 0
x = x0
if not postprocessing:
postprocessing = lambda x: x
expcost = None ######################################################
for iter in xrange(start_iter + 1, iterations + 1):
# Don't forget to apply the postprocessing after every iteration!
# You might want to print the progress every few iterations.
cost = None
### END YOUR CODE
if iter % PRINT_EVERY == 0:
if not expcost:
expcost = cost
else:
expcost = .95 * expcost + .05 * cost ########################
print "iter %d: %f" % (iter, expcost)
if iter % SAVE_PARAMS_EVERY == 0 and useSaved:
save_params(iter, x)
if iter % ANNEAL_EVERY == 0:
step *= 0.5
return x
for my purpose i have no use of expcost. but what is the purpose of the expcost in the code. under what circumstances can it possibly be used? why is it used in modifying the cost calculated by the cost function?
If you notice, expcost
is only used for printing out the cost. It's just a way of smoothing the cost function since it can jump noticeably from batch to batch, despite the model's improvements