Search code examples
pythoncntk

CNTK create_trainer equations


In the following CNTK create_trainer handler (in python) I am trying to understand what these two equations mean - I believe this is learning rate per minibatch but the rest of it is not commented by Microsoft. Does anyone understand these lr_per_mb equations and their significance?

lr_per_mb = [1.0]*80+[0.1]*40+[0.01]

lr_per_mb = [0.1]*1+[1.0]*80+[0.1]*40+[0.01]

def create_trainer(network, minibatch_size, epoch_size, num_quantization_bits, block_size, warm_up, progress_printer):

if network['name'] == 'resnet20': 
    lr_per_mb = [1.0]*80+[0.1]*40+[0.01]
elif network['name'] == 'resnet110': 
    lr_per_mb = [0.1]*1+[1.0]*80+[0.1]*40+[0.01]
else: 
    return RuntimeError("Unknown model name!")

Solution

  • The syntax [a1] * b + [a1] * d + a3 means, the learner will use a learning rate of a1 for the first b iterations (epochs / samples: depending on your trainer iteration setup), then it will use a learning rate of a2 for d iterations(epochs or samples), and for remaining iterations the learning rate shall be a3 .

    Typically you would start with a high learning rate and as the training proceeds, one would lower the learning rate. This is what you are seeing in the code above. Furthermore, for the two networks, learning rates are different. A lot of effort goes in finding the right parameters for learning rate. Hence, using some of the numbers in the examples as a initial starting point can potentially save a lot of your time.