Search code examples
apache-sparkapache-spark-mllibgradient-descentscala-breeze

SGD with L2 regularization in mllib


I am having difficulty reading open source mllib code for SGD with L2 regularization.

The code is

class SquaredL2Updater extends Updater {
override def compute(
  weightsOld: Vector,
  gradient: Vector,
  stepSize: Double,
  iter: Int,
  regParam: Double): (Vector, Double) = {
// add up both updates from the gradient of the loss (= step) as well as
// the gradient of the regularizer (= regParam * weightsOld)
// w' = w - thisIterStepSize * (gradient + regParam * w)
// w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient
val thisIterStepSize = stepSize / math.sqrt(iter)
val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector
brzWeights :*= (1.0 - thisIterStepSize * regParam)
brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
val norm = brzNorm(brzWeights, 2.0)

(Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)
}

The part I am having trouble with is

brzWeights :*= (1.0 - thisIterStepSize * regParam)

the breeze lib has documentation that explains the :*= operator

/** Mutates this by element-wise multiplication of b into this. */
final def :*=[TT >: This, B](b: B)(implicit op: OpMulScalar.InPlaceImpl2[TT, B]): This = {
 op(repr, b)
 repr
}

it looks like its just multiplication of a vector by a scalar.

The formula I found for gradient in case of L2 regularization is

L2 gradient

How does the code represent this gradient in this update? Can someone help please.


Solution

  • Ok, I figured it out. The updater equation is

    enter image description here

    rearranging terms gives

    enter image description here

    recognizing the last term is just the gradient

    enter image description here

    This is equivalent to the code which has

    brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
    

    breaking that out

    brzWeights = brzWeights + -thisIterStepSize * gradient.toBreeze
    

    in the previous line, brzWeights :*= (1.0 - thisIterStepSize * regParam)

    which means brzWeights = brzWeights * (1.0 - thisIterStepSize * regParam)

    so, finally

    brzWeights = brzWeights * (1.0 - thisIterStepSize * regParam) + (-thisIterStepSize) * gradient.toBreeze
    

    Now the code and equation match within a normalization factor, which I believe is taken care of in the following line.