Trying to understand the SGD
optimization code in keras optimizers (source code). In the get_updates
module, we have:
# momentum
shapes = [K.int_shape(p) for p in params]
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
where K = keras.backend
. Now, since moments
is set to a list a zero tensors, and m
is an iteration of this list, why doesn't m
always evaluate to a zero tensor in the line v = self.momentum * m - lr * g
?
Now I looked up the code for keras.backend.zeros
for tensorflow (source code), and keras.backend.zeros
returns tf.zeros
, which apparently returns a constant tensor of zeros. (Edit: Or returns a tf.Variable
initialized with tf.zeros
if shape is specified.)
My intuition would be that it would return something like tf.get_variable()
with an initiatilizer of zeros, and thus the tensor would not be overwritten each time. Instead a tensor with name m
would just keep getting updated by K.update()
.
So does tf.zeros()
actually behave like tf.get_variable()
with a zero initialization? Is there something else I am missing?
Edit: So even if shapes are specified, source code linked above still seems to return a new tensor variable, not reuse the existing one (i.e. using get_variable()
), which would seem difficult anyway since no name was specified. Still confused as to why the existing variable is returned as opposed to a new tensor variable of zeros.
I think you missed the right K.zeros
function. Here's the source code in keras 2.1 (keras/backend/tensorflow_backend.py
):
def zeros(shape, dtype=None, name=None):
"""Instantiates an all-zeros variable and returns it.
# Arguments
shape: Tuple of integers, shape of returned Keras variable
dtype: String, data type of returned Keras variable
name: String, name of returned Keras variable
# Returns
A variable (including Keras metadata), filled with `0.0`.
# Example
```python
>>> from keras import backend as K
>>> kvar = K.zeros((3,4))
>>> K.eval(kvar)
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]], dtype=float32)
```
"""
if dtype is None:
dtype = floatx()
tf_dtype = tf.as_dtype(dtype)
return variable(tf.constant_initializer(0., dtype=tf_dtype)(shape),
dtype, name)
As you can see, it actually returns the variable initialized with zeros, not the constant zeros tensor. The documentation states the same:
Instantiates an all-zeros variable and returns it.
Edit: the answer to the follow-up question.
This is actually a very good observation: you are right, the subsequent calls to Optimizer.get_updates(loss, params)
will create new variables, assign new ops to self.updates
and new weights to self.weights
. In some sense, get_updates
method is a part of optimizer's constructor.
But here's how it works: this method is called exactly once per model instance. It returns the list of update ops that are applied many times in a loop for different batches, but the ops themselves stay the same. Here's the relevant code of the Model
class (keras/engine/training.py
):
def _make_train_function(self):
...
if self.train_function is None:
...
with K.name_scope('training'):
with K.name_scope(self.optimizer.__class__.__name__):
training_updates = self.optimizer.get_updates(
params=self._collected_trainable_weights,
loss=self.total_loss)
updates = self.updates + training_updates + self.metrics_updates
# Gets loss and metrics. Updates weights at each call.
self.train_function = K.function(inputs,
[self.total_loss] + self.metrics_tensors,
updates=updates,
name='train_function',
**self._function_kwargs)
self.optimizer.get_updates(...)
is called exactly once to construct the train_function
.
Feel free to examine other optimizers and check that they all prepare the weights and update ops inside get_updates()
method.