In caffe I create a simple network to classifying face images as follows:
myExampleNet.prototxt
name: "myExample"
layer {
name: "example"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/myExample/myExample_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/myExample/myExample_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "data"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 50
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 155
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
myExampleSolver.prototxt
net: "examples/myExample/myExampleNet.prototxt"
test_iter: 15
test_interval: 500
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100
max_iter: 30000
snapshot: 5000
snapshot_prefix: "examples/myExample/myExample"
solver_mode: CPU
I use convert_imageset
of caffe to create LMDB database and my data has about 40000 training and 16000 testing data in face. 155 cases and each one has about 260 and 100 images of train and test respectively.
I use this command for training data:
build/tools/convert_imageset -resize_height=100 -resize_width=100 -shuffle examples/myExample/myData/data/ examples/myExample/myData/data/labels_train.txt examples/myExample/myExample_train_lmdb
and this command for test data:
build/tools/convert_imageset -resize_height=100 -resize_width=100 -shuffle examples/myExample/myData/data/ examples/myExample/myData/data/labels_test.txt examples/myExample/myExample_test_lmdb
But after 30000 iterations my loss is high and the accuracy is low:
...
I0127 09:25:55.602881 27305 solver.cpp:310] Iteration 30000, loss = 4.98317
I0127 09:25:55.602917 27305 solver.cpp:330] Iteration 30000, Testing net (#0)
I0127 09:25:55.602926 27305 net.cpp:676] Ignoring source layer example
I0127 09:25:55.827739 27305 solver.cpp:397] Test net output #0: accuracy = 0.0126667
I0127 09:25:55.827764 27305 solver.cpp:397] Test net output #1: loss = 5.02207 (* 1 = 5.02207 loss)
and when I change my dataset to mnist and change the ip2
layer num_output
from 155 to 10, the loss is dramatically reduced and accuracy increases!
Which part is wrong?
There is not necessarily something wrong in your code.
The fact that you get these good results for MNIST says indeed that you have a model that is 'correct' in the sense that it does not produce coding errors etc, but it is in no way any guarantee that it will perform well in another, different problem.
Keep in mind that, in principle, it is much easier to predict a 10-class problem (like MNIST) than a 155-class one; the baseline (i.e. simple random guessing) accuracy in the first case is about 10%, while for the second case is only ~ 0.65%. Add that your data size (comparable to MNIST) is not bigger either (are they also color pictures, i.e. 3-channels in contrast with the single-channel MNIST?), and your results may start looking not that puzzling and surprising.
Additionally, it has turned out that MNIST is notoriously easy to fit (I have been trying myself to build models that will not fit MNIST well, without much success so far), and you easily reach a conclusion that has now become common wisdom in the community, i.e. that good performance on MNIST does not say really much for a model architecture.