I am new to MXNet and want to solve a simple example that uses 1 layer network to solve the digit classification problem. My program goes as follows:
import math
import numpy as np
import mxnet as mx
import matplotlib.pyplot as plt
import logging
with np.load("notMNIST.npz") as data:
images, labels = data["images"], data["labels"]
# Reshape the images from 28x28 into 784 1D-array and flaten the labels.
images = images.reshape(784, 18720) labels = labels.reshape(18720)
# Apply one-hot encoding.
Images = images.T.astype(np.float32)
Labels = np.zeros((18720, 10)).astype(np.float32)
Labels[np.arange(18720), labels] = 1
# Segment the data into training, evaluation and testing.
X_train = Images[0 : 15000]
y_train = Labels[0 : 15000]
X_eval = Images[15000 : 16000]
y_eval = Labels[ 1200 : 2200] # IMPORTANT!!!
X_test = Images[16000 : 18720]
y_test = Labels[16000 : 18720]
train_iter = mx.io.NDArrayIter(X_train, y_train, 100, shuffle=False)
_eval_iter = mx.io.NDArrayIter(X_eval , y_eval , 100, shuffle=False)
# Variables
X = mx.sym.Variable(name='data')
# Neural Network Layers
fully_connected_layer = mx.sym.FullyConnected(data=X, name='fc1', num_hidden=10)
# Outputs
lro = mx.sym.SoftmaxOutput(data=fully_connected_layer, name="softmax")
model = mx.mod.Module(symbol=lro)
model.fit(train_data=train_iter, eval_data=_eval_iter,
optimizer='sgd', optimizer_params={
'learning_rate' : 1e-5,
'momentum' : 0.1},
After running the program with evaluation label 15000
to 16000
, the final step is reporting a validation accuracy of 97%
, which I personally argue is too high for a 1-layer network. Therefore, I deliberately changed the evaluation labels to 1200
to 2200
and saw that the program is still reporting an accuracy at around 83~86%
(at first I thought that maybe it is just a coincidence and tried several different evaluation labels but still got similar results).
What mistakes have I made in my program?
You can fix the problem, if you stop doing one-hot encoding.
Instead of passing Labels[0:15000], Labels[15000:16000] and Labels[16000:18720] pass labels[0:15000], labels[15000:16000] and labels[16000:18720].
This will decrease your accuracy to mediocre 0.796000 on proper evaluation labels, and down to 0.095000 on your "random" evaluation labels.
Detailed answer
You get such high accuracy due to a misleading calculation of mxnet.metric.Accuracy. Internally, Accuracy metric can work in 2 "modes" depending on shapes of provided arguments "preds" and "labels":
For example, if you have preds=[[0.1, 0.9], [0.8, 0.2]] then it means that:
Working in this mode, "labels" are expected to be an array of real classes. In our case, imagining that the model is absolutely correct, the "labels" array should have been [1, 0].
2) If shapes of "preds" and "labels" do match, then Accuracy treats arrays as predicted classes and real classes. So each item is treated as a class of one sample. Then calculation is done as a comparison of items in "preds" "labels" with the same indices.
When you apply one-hot encoding to labels the second mode of calculation is used, because the shape of predictions from the model matches to the shape of one-hot encoding. Accuracy interprets each item in arrays as a standalone sample and compare them to each other.
Internally, Accuracy converts float array to int, which for floats less than 1 always produces 0. That behavior essentially convert all predictions to 0, except of a rare case when there is a class with 1.0 probability. So in the majority of cases we get preds = [0, 0, ..., 0].
One-hot encoding array has all items except of one equals to 0. Meaning we would have something like [0, 1, 0, ..., 0].
When Accuracy compares these two arrays, it founds that they are mostly equal, except of one place, returning back wrongly high accuracy.
Here is a simple reproducing example:
import mxnet as mx
predicts = mx.nd.array([[1.29206967e-09, 3.40120096e-05, 2.23299547e-12, 3.98692492e-07,
1.21151755e-10, 2.59370694e-08, 1.95488334e-02, 1.13474562e-05,
9.80405331e-01, 3.51648767e-12]])
labels = mx.nd.array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
acc = mx.metric.Accuracy()
acc.update(preds=predicts, labels=labels)
This will give us
('accuracy', 0.90000000000000002)
because one-hot encoding contains exactly 1 non-zero element.