from keras.datasets import mnist
from keras import models, layers
from keras.utils import to_categorical
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))
network.compile(optimizer='rmsprop',
loss='mean_squared_error',
metrics=['accuracy'])
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
network.fit(train_images, train_labels, epochs=5, batch_size=128)
test_loss, test_acc = network.evaluate(test_images, test_labels, batch_size=128)
print("test_acc: ", test_acc)
Epoch 1/5
60000/60000 [==============================] - 2s 41us/step - loss: 0.2600 - acc: 0.9244
Epoch 2/5
60000/60000 [==============================] - 2s 34us/step - loss: 0.1055 - acc: 0.9679
Epoch 3/5
60000/60000 [==============================] - 2s 33us/step - loss: 0.0688 - acc: 0.9791
Epoch 4/5
60000/60000 [==============================] - 2s 35us/step - loss: 0.0504 - acc: 0.9848
Epoch 5/5
60000/60000 [==============================] - 2s 38us/step - loss: 0.0373 - acc: 0.9889
10000/10000 [==============================] - 0s 18us/step
test_acc: 0.9791
It seems that there is no problem in training process, but I'm not sure how MSE is calculated. In this case, does keras(or tensorflow) automatically convert label encoding to one-hot encoding when calculating MSE?
You have manually converted your labels to one-hot encoding already via :
train_labels = to_categorical(train_labels)
As your softmax layer contains 10 nodes I will assume you intended for classification of 10 labels, meaning train_labels
will look something like:
[
[0,0,0,1,0,0,0,0,0,0],... <--- One of these per training row
]
See the documentation on this.
The softmax output for that row may look like :
[0.033,0.45,0.01,0.9,0,0,0.5,0.4,0.3,0.95]
As explained in this handy resource:
The softmax function will output a probability of class membership for each class label and attempt to best approximate the expected target for a given input.
For example, if the integer encoded class 1 was expected for one example, the target vector would be:
[0, 1, 0] The softmax output might look as follows, which puts the most weight on class 1 and less weight on the other classes.
[0.09003057 0.66524096 0.24472847]
And then the mean squared error is calculated on those two sets of data, with the true labels y_true
as per the to_categorical
output and the predicted labels y_pred
being the softmax output from your network.
From the tensorflow source code on MSE, this works by:
y_true
and y_pred
and squaring the result i.e. with the above two:import tensorflow as tf
from tensorflow.python.keras import backend as K
from tensorflow.python.ops import math_ops
y_true = [0,0,0,1,0,0,0,0,0,0]
y_pred = [0.033,0.45,0.01,0.9,0,0,0.5,0.4,0.3,0.95]
math_ops.squared_difference(y_pred, y_true)
<tf.Tensor: shape=(10,), dtype=float32, numpy=
array([1.0890000e-03, 2.0249999e-01, 9.9999997e-05, 1.0000004e-02,
0.0000000e+00, 0.0000000e+00, 2.5000000e-01, 1.6000001e-01,
9.0000004e-02, 9.0249997e-01], dtype=float32)>
K.mean(math_ops.squared_difference(y_pred, y_true))
<tf.Tensor: shape=(), dtype=float32, numpy=0.1616189>
This is obviously just for single example but it handles multi-dimensional calculations in the same way as per the below simplified example:
>>> y_true = [[1,0],[0,1]]
>>> y_pred = [[0.95,0.03],[0.3,0.8]]
>>> K.mean(math_ops.squared_difference(y_pred, y_true))
<tf.Tensor: shape=(), dtype=float32, numpy=0.03335>
You can see that the result is a single number every time, and that's your loss.