I'm writing a simple logistic regression with tensorflow. I found out that when using tf.nn.softmax, the algorithm converges much quicker, and in the end the accuracy is higher. If switched to my own implementation of softmax, the network converges slower, and the end accuracy is not as good.
Here's the code:
SEED = 1025
W = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels], seed=SEED))
b = tf.Variable(tf.zeros([num_labels]))
logits = tf.matmul(train_dataset, W) + b
# My softmax:
y_ = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis=0)
# Tensorflow softmax:
y_ = tf.nn.softmax(logits)
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
loss = -tf.reduce_mean(tf.reduce_sum(train_labels * tf.log(y_clipped), axis=1))
Using my softmax:
Loss at step 0: 22.213934
Training accuracy: 12.7%
Validation accuracy: 13.2%
Loss at step 100: 12.777291
Training accuracy: 45.3%
Validation accuracy: 45.5%
Loss at step 200: 11.361242
Training accuracy: 48.2%
Validation accuracy: 47.4%
Loss at step 300: 10.658278
Training accuracy: 51.4%
Validation accuracy: 49.7%
Loss at step 400: 9.297832
Training accuracy: 59.2%
Validation accuracy: 56.8%
Loss at step 500: 8.902699
Training accuracy: 62.0%
Validation accuracy: 59.2%
Loss at step 600: 8.681184
Training accuracy: 64.2%
Validation accuracy: 61.0%
Loss at step 700: 8.529438
Training accuracy: 65.8%
Validation accuracy: 62.3%
Loss at step 800: 8.416442
Training accuracy: 66.8%
Validation accuracy: 63.3%
Test accuracy: 70.4%
Using tensorflow's softmax:
Loss at step 0: 13.555875
Training accuracy: 12.7%
Validation accuracy: 14.5%
Loss at step 100: 2.194562
Training accuracy: 72.5%
Validation accuracy: 72.0%
Loss at step 200: 1.808641
Training accuracy: 75.5%
Validation accuracy: 74.5%
Loss at step 300: 1.593390
Training accuracy: 76.8%
Validation accuracy: 75.0%
Loss at step 400: 1.442661
Training accuracy: 77.7%
Validation accuracy: 75.2%
Loss at step 500: 1.327751
Training accuracy: 78.2%
Validation accuracy: 75.4%
Loss at step 600: 1.236314
Training accuracy: 78.5%
Validation accuracy: 75.6%
Loss at step 700: 1.161479
Training accuracy: 78.9%
Validation accuracy: 75.6%
Loss at step 800: 1.098717
Training accuracy: 79.4%
Validation accuracy: 75.8%
Test accuracy: 83.3%
From the documentation, in theory tensorflow's softmax should be exact the same as I implemented, no?
This function performs the equivalent of
softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
EDIT: I added a seed when initializing from normal distribution, now I can reproduce the accuracy results. When setting axis value in "My softmax" line, only axis=0 doesn't result in error. Setting axis=1 or axis=-1 both results in this error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 10 and 10000 for 'truediv' (op: 'RealDiv') with input shapes: [10000,10], [10000].
W = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels]))
introduces randomness in you program because the weights are intially set randomly so every time you run your program you will get different results tf.truncated_normal
function does take a seed argument ... you can use that argument and see what the outcomes are