Search code examples
neural-networkbackpropagationstochasticmlpadam

Is Adam optimezer updating weight in every layer?


I'm newbie in Neural network So I little bit confuse about ADAM optimezer. For Example I use MLP with architecture like this: enter image description here

I've used SDG before, so I want to ask if changing the weight with adam's optimization is the same as SDG updating the weight on each layer? In the example above, does that mean there will be 2 weight changes from output to hid layer 2, 8 weight changes from hid layer 2 to hid layer 1, and finally 4 weight changes from hid layer 1 to input? because of the example I see, they only update the weights from the output to the hidden layer 2 only.


Solution

  • You can use both SGD and Adam to calculate updates for every weight in your network (as long as the loss is differentiable w.r.t. the weight). If you use Tensorflow or Pytorch and build the model in the sketch, by default, all weights will be updated when you perform an optimizer step. (If you really want to, you can also limit the optimizer to only a subset of parameters.)

    The difference between SGD and Adam is that with SGD the weight updates are simple steps in the direction of the (negative) gradient, while with Adam the gradient steps are scaled the using running statistics of previous weight updates.