apache-spark apache-spark-mllib gradient-descent scala-breeze

SGD with L2 regularization in mllib

I am having difficulty reading open source mllib code for SGD with L2 regularization.

The code is

class SquaredL2Updater extends Updater {
override def compute(
  weightsOld: Vector,
  gradient: Vector,
  stepSize: Double,
  iter: Int,
  regParam: Double): (Vector, Double) = {
// add up both updates from the gradient of the loss (= step) as well as
// the gradient of the regularizer (= regParam * weightsOld)
// w' = w - thisIterStepSize * (gradient + regParam * w)
// w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient
val thisIterStepSize = stepSize / math.sqrt(iter)
val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector
brzWeights :*= (1.0 - thisIterStepSize * regParam)
brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
val norm = brzNorm(brzWeights, 2.0)

(Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)
}

The part I am having trouble with is

brzWeights :*= (1.0 - thisIterStepSize * regParam)

the breeze lib has documentation that explains the :*= operator

/** Mutates this by element-wise multiplication of b into this. */
final def :*=[TT >: This, B](b: B)(implicit op: OpMulScalar.InPlaceImpl2[TT, B]): This = {
 op(repr, b)
 repr
}

it looks like its just multiplication of a vector by a scalar.

The formula I found for gradient in case of L2 regularization is

How does the code represent this gradient in this update? Can someone help please.

Solution

Ok, I figured it out. The updater equation is

rearranging terms gives

recognizing the last term is just the gradient

This is equivalent to the code which has

brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)

breaking that out

brzWeights = brzWeights + -thisIterStepSize * gradient.toBreeze

in the previous line, brzWeights :*= (1.0 - thisIterStepSize * regParam)

which means brzWeights = brzWeights * (1.0 - thisIterStepSize * regParam)

so, finally

brzWeights = brzWeights * (1.0 - thisIterStepSize * regParam) + (-thisIterStepSize) * gradient.toBreeze

Now the code and equation match within a normalization factor, which I believe is taken care of in the following line.