Examples for CNTK Learners

I have been going through the Microsft's Python CNTK Tutorials for version 2 Beta 9.0. I haven't found good documentation with examples of recommended values to pass to the different learners available. I have been able to get the following learners working on the CNTK 103: Part B - Feed Forward Network with MNIST toturial:

    lr_per_minibatch=learning_rate_schedule(0.2, UnitType.minibatch)
    trainer = Trainer(z, ce, pe, sgd(z.parameters, lr=lr_per_minibatch))

    lr_per_minibatch=learning_rate_schedule(0.2, UnitType.minibatch)
    trainer = Trainer(z, ce, pe, adagrad(z.parameters, lr=lr_per_minibatch))

    lr_per_minibatch=learning_rate_schedule(0.05, UnitType.minibatch)
    trainer = Trainer(z, ce, pe, adam_sgd(z.parameters, lr=lr_per_minibatch, momentum=momentum_as_time_constant_schedule(700) ))

    lr_per_minibatch=learning_rate_schedule(0.2, UnitType.minibatch)
    trainer = Trainer(z, ce, pe, nesterov(z.parameters, lr=lr_per_minibatch, momentum=momentum_as_time_constant_schedule(700) ))

    lr_per_minibatch=learning_rate_schedule(0.1, UnitType.minibatch)
    trainer = Trainer(z, ce, pe, rmsprop(z.parameters, lr=lr_per_minibatch, gamma=0.90, inc=0.03, dec=0.03, max=0.1, min=0.1 ))

These work, but does anyone have good examples of recommended values of the parameters that each trainer receives?

Solution

For the current learners the best parameters depend on the data and the problem you are solving. Therefore it is very hard to provide good recommendations. One typical piece of advice is if a learning rate works then all smaller learning rates will work but you will have to run longer (i.e. do more sweeps over the data).