Search code examples
machine-learninggradient-descentstochastic

Is cost function for Stochasitc Gradient Desscent calculated for all rows or only for the row of iteration?


In SGD I want to understand is cost calculated for all rows before we update the parameter again by going to next row or is cost is calculated only for the next row before updating parameters?


Solution

  • In stochastic gradient descent, you update parameters using batches. If your training set has N examples (=rows), you only use B of them for each update of the parameters, where B<=N. Those B should be chosen at random from the N examples on each iteration of parameter updating. (Choosing randomly by either choosing with substitution, choosing without substitution, or just shuffling the training set in advance). So, you calculate the gradient of the cost using B examples each time. (notice that you do not actually need to calculate the cost - only its gradient). B can in particular be equal to N, and it can also be equal to 1 (which is called online learning).

    In addition, sometimes you want to see some metrics of the learning during the optimization process. For example, every once in a while you may want to see the value of cost on the entire training set (this can help for a the termination condition), or to see the value of the cost on the entire validation set (for example when monitoring to make sure you don't over-fit). In these cases, you may want to calculate the cost (and not the gradient) over the entire sets.