I have a neural network with three hidden layers, which can be trained with "pure" gradient descent or with some more sophisticated techniques. I also noticed that in the my problem momentum-based optimization methods (adam, adadelta, momentum) works much better.
Now to the interesting part. By design, I want to disable momentum in the first layer of the NN. That means, I want to update weights with Adam in second and third layers, but use simple gradient descent in first layer.
Of course, I can always write my own optimizer: calculate gradients with tf.gradients(loss, tf.trainable_variables())
and then just do the momentum trick myself. But it would be nice to have option to use special optimizer parameters in every layer. Have anybody heard about the way to do such thing?
Well, you can provide the list of variables to optimize to the optimizer (docs):
opt = tf.train.AdamOptimizer()
opt_op = opt.minimize(loss, var_list=[W1, b1])
opt2 = tf.train.GradientDescentOptimizer(learning_rate)
opt2_op = opt2.minimize(loss, var_list=[W2, b2])
You will have to extract the variable list (probably weights and biases) of a given layer yourself.