Search code examples
tensorflowtensorboardmnistsoftmax

All weights are all zero (not changing) in MNIST sample


I have trained and evaluated the first example of TF using a softmax function on MNIST and the results were as expected around 92%. However, I would like to see the weights and the biases at each iteration.

Looking at the code I see that both initialized to zero at each iteration which supposedly not an efficient way of initializing:

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

Another question has been suggested to truncate them with a small stddev:

W = tf.Variable(tf.truncated_normal([784, 10],stddev=0.001))
b = tf.Variable(tf.truncated_normal([10],stddev=0.001))

I tested this way as well, but on both occasions are the weights are coming out to be not changing (first case all zero and the second case non-zero) and the biases are just the ones that are changing.

MWE:

print "Iteration:", '%04d' % (iteration + 1), "cost=", "{:.9f}".format(avg_cost)
print "Bias: ", b.eval()
print "Weights: ", W.eval()

And here is the result on the first few prints:

Iteration: 0001 cost= 29.819621965
Bias:  [-0.38608965  0.36391538  0.1257894  -0.25784218  0.0603136   1.46654773
 -0.11613362  0.62797612 -1.63218892 -0.25228417]
Weights:  [[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
Iteration: 0003 cost= 20.975814890
Bias:  [-0.71424055  0.5187394   0.24631855 -0.44207239 -0.07629333  2.24541211
 -0.20360497  1.08866096 -2.26480484 -0.39810511]
Weights:  [[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]

Interestingly enough, I see some non-zeros weight in my tensorboard viewer:

Can someone explain why I am seeing this behaviour and the mismatch? I would like see the weights of each layers (in this case we have only one) in TensorFlow and check their values.


Solution

  • When we print a numpy array only initial and last values will get printed, And in case of MNIST those indices of weights are not updating as corresponding pixels in images remains constant as all digits are written in centre part of array or image not along boundary regions. The actual pixels which are varying from one input sample to another input sample are centre pixels so only those corresponding weights elements will get update. To compare weights before and after training you can use numpy.array_equal(w1, w2) or, you can print whole numpy array by doing: import numpy numpy.set_printoptions(threshold='nan') or, you can compare element by element, and print only those values of array which differ by a certain threshold