Search code examples
pythontensorflowsoftmax

Tensorflow tf.nn.softmax() function performs much better than hand-written softmax


I'm writing a simple logistic regression with tensorflow. I found out that when using tf.nn.softmax, the algorithm converges much quicker, and in the end the accuracy is higher. If switched to my own implementation of softmax, the network converges slower, and the end accuracy is not as good.

Here's the code:

SEED = 1025
W = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels], seed=SEED))
b = tf.Variable(tf.zeros([num_labels]))
logits = tf.matmul(train_dataset, W) + b

# My softmax:
y_ = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis=0)
# Tensorflow softmax: 
y_ = tf.nn.softmax(logits)

y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
loss = -tf.reduce_mean(tf.reduce_sum(train_labels * tf.log(y_clipped), axis=1))

Using my softmax:

Loss at step 0: 22.213934
Training accuracy: 12.7%
Validation accuracy: 13.2%
Loss at step 100: 12.777291
Training accuracy: 45.3%
Validation accuracy: 45.5%
Loss at step 200: 11.361242
Training accuracy: 48.2%
Validation accuracy: 47.4%
Loss at step 300: 10.658278
Training accuracy: 51.4%
Validation accuracy: 49.7%
Loss at step 400: 9.297832
Training accuracy: 59.2%
Validation accuracy: 56.8%
Loss at step 500: 8.902699
Training accuracy: 62.0%
Validation accuracy: 59.2%
Loss at step 600: 8.681184
Training accuracy: 64.2%
Validation accuracy: 61.0%
Loss at step 700: 8.529438
Training accuracy: 65.8%
Validation accuracy: 62.3%
Loss at step 800: 8.416442
Training accuracy: 66.8%
Validation accuracy: 63.3%
Test accuracy: 70.4%

Using tensorflow's softmax:

Loss at step 0: 13.555875
Training accuracy: 12.7%
Validation accuracy: 14.5%
Loss at step 100: 2.194562
Training accuracy: 72.5%
Validation accuracy: 72.0%
Loss at step 200: 1.808641
Training accuracy: 75.5%
Validation accuracy: 74.5%
Loss at step 300: 1.593390
Training accuracy: 76.8%
Validation accuracy: 75.0%
Loss at step 400: 1.442661
Training accuracy: 77.7%
Validation accuracy: 75.2%
Loss at step 500: 1.327751
Training accuracy: 78.2%
Validation accuracy: 75.4%
Loss at step 600: 1.236314
Training accuracy: 78.5%
Validation accuracy: 75.6%
Loss at step 700: 1.161479
Training accuracy: 78.9%
Validation accuracy: 75.6%
Loss at step 800: 1.098717
Training accuracy: 79.4%
Validation accuracy: 75.8%
Test accuracy: 83.3%

From the documentation, in theory tensorflow's softmax should be exact the same as I implemented, no?

This function performs the equivalent of

softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)

EDIT: I added a seed when initializing from normal distribution, now I can reproduce the accuracy results. When setting axis value in "My softmax" line, only axis=0 doesn't result in error. Setting axis=1 or axis=-1 both results in this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 10 and 10000 for 'truediv' (op: 'RealDiv') with input shapes: [10000,10], [10000].

Solution

    • Assuming your softmax implementation is correct
    • First of all it is not fair to compare tensorflow softmax with handwritten softmax because there is randomness included in your program
    • By this what I mean is the line W = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels])) introduces randomness in you program because the weights are intially set randomly so every time you run your program you will get different results
    • You can only compare both the softmax if you have some kind of seed(some kind of starting point)
    • Now if you have performed the above experiment multiple times and every time the tensorflow softmax beats the handwritten softmax then your question is valid
    • The tf.truncated_normal function does take a seed argument ... you can use that argument and see what the outcomes are
    • Anyways if your handwritten softmax is correct then with the seed the tensorflow softmax and your softmax should output the same results
    • And even I think your axis should be 1 in your case which is the last axis as the softmax should be along the axis where there are classes