neural-network backpropagation stochastic mlp adam

Is Adam optimezer updating weight in every layer?

I'm newbie in Neural network So I little bit confuse about ADAM optimezer. For Example I use MLP with architecture like this:

I've used SDG before, so I want to ask if changing the weight with adam's optimization is the same as SDG updating the weight on each layer? In the example above, does that mean there will be 2 weight changes from output to hid layer 2, 8 weight changes from hid layer 2 to hid layer 1, and finally 4 weight changes from hid layer 1 to input? because of the example I see, they only update the weights from the output to the hidden layer 2 only.

Solution

You can use both SGD and Adam to calculate updates for every weight in your network (as long as the loss is differentiable w.r.t. the weight). If you use Tensorflow or Pytorch and build the model in the sketch, by default, all weights will be updated when you perform an optimizer step. (If you really want to, you can also limit the optimizer to only a subset of parameters.)

The difference between SGD and Adam is that with SGD the weight updates are simple steps in the direction of the (negative) gradient, while with Adam the gradient steps are scaled the using running statistics of previous weight updates.