What is the CNTK equivalent of a simple SGD on TensorFlow?

Following the MNIST for ML beginners in TensorFlow, we learn the most basic SGD with learning rate 0.5, batch size 100 and 1000 steps like this

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)`
...
for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

In CNTK, the intuitive equivalent

SGD = {
    minibatchSize = 100
    maxEpochs = 1000
    learningRatesPerMB = 0.5
}

looks like it's doing more computation, at least it is certainly more verbose.

There's a different concept of minibatch and epochs in CNTK from what I can see, also the way it treats the learning rate.

What would be the direct equivalent (or closest possible) of the basic SGD in TensorFlow shown? How do each concept translate between each framework?

Solution

It looks like the TensorFlow and CNTK have same definition for mini batch:

'Minibatch size' in CNTK means the number of samples processed between model updates

The epoch is CNTK is simialr to step in TensorFlow, i.e. how many session runs on the train op.

maxEpochs: maximum number of epochs to run.

The learningRatesPerMB is a bit differnt:

this will be converted into learningRatesPerSample by dividing the values by the specified 'minibatchSize'

The learningRatesPerSample is similar to TensorFlow's learning rate.

CNTK's documentation about SGD: https://github.com/Microsoft/CNTK/wiki/SGD-Block