Search code examples
pythontensorflowmoving-average

The result of tf.ExponentialMovingAverage is not as expected


I would like to study how tf.ExponentialMovingAverage works. Here is the code:

w1 = tf.constant(10., dtype=tf.float32)
w2 = tf.constant(20., dtype=tf.float32)
w3 = tf.constant(40., dtype=tf.float32)
tf.add_to_collection('w', w1)
tf.add_to_collection('w', w2)
tf.add_to_collection('w', w3)

w = tf.get_collection('w')

ema = tf.train.ExponentialMovingAverage(decay=0.9)
ema_op = ema.apply(w)

with tf.control_dependencies([ema_op]):
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for i in w:
            print(sess.run(ema.average(i)))

The results are :

1.0000002
2.0000005
4.000001

However, according to the formula in tf.ExponentialMovingAverage, the results should be

0.9 * 0 + (1 - 0.9) * 10. = 1.0
0.9 * 1.0 + (1 - 0.9) * 20. = 2.9
0.9 * 2.9 + (1 - 0.9) * 40 = 6.61

It seems like tf.ExponentialMovingAverage does not update the shadow value using last shadow value, but calculates moving average independently for each iteration.

Am I thinking wrong? Any help would be appreciated!


Solution

  • There are some misconceptions in your example:

    1. The moving average is defined based on a variable or tensor. You effectively created moving averages for each of your constants (explaining the results you are getting).
    2. You have to call ema_op every time you want to update your moving average.
    3. The moving average is initialised with the initial value of your variable (not zero as you are expecting).

    The following example behaves as you are expecting:

    import tensorflow as tf
    
    w = tf.Variable(0.0, dtype=tf.float32)
    ema = tf.train.ExponentialMovingAverage(decay=0.9)
    ema_op = ema.apply([w])
    
    assigns = []
    
    with tf.control_dependencies([ema_op]):
        for val in [10., 20., 40.]:
            assigns.append(tf.assign(w, tf.constant(val)))
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for assign in assigns:
            _, _, w_ = sess.run([ema_op, assign, ema.average(w)])
            print w_
        _, w_ = sess.run([ema_op, ema.average(w)])
        print w_
    

    The resulting output is:

    0.0
    1.0000002
    2.9000006
    6.6100016