I have been using Tensorflow Object Detection API on my own dataset. While training, I want to know how well the NN is learning from the Training set. So, I want to run an evaluation on both training and eval set and get accuracy (mAP) respectively during the training sessions.
My config file:
model {
faster_rcnn {
num_classes: 50
image_resizer {
fixed_shape_resizer {
height: 960
width: 960
}
}
number_of_stages: 3
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 8
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 8
width_stride: 8
}
}
first_stage_atrous_rate: 2
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.00999999977648
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.699999988079
first_stage_max_proposals: 100
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.00999999977648
}
}
}
predict_instance_masks: true
mask_height: 33
mask_width: 33
mask_prediction_conv_depth: 0
mask_prediction_num_conv_layers: 4
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.300000011921
iou_threshold: 0.600000023842
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
second_stage_mask_prediction_loss_weight: 4.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.003
schedule {
step: 3000
learning_rate: 0.00075
}
schedule {
step: 6000
learning_rate: 0.000300000014249
}
schedule {
step: 15000
learning_rate: 0.000075
}
schedule {
step: 18000
learning_rate: 0.0000314249
}
schedule {
step: 900000
learning_rate: 2.99999992421e-05
}
schedule {
step: 1200000
learning_rate: 3.00000010611e-06
}
}
}
momentum_optimizer_value: 0.899999976158
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "./mask_rcnn_resnet101_atrous_coco/model.ckpt"
from_detection_checkpoint: true
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
label_map_path: "./map901_label_map.pbtxt"
load_instance_masks: true
mask_type: PNG_MASKS
tf_record_input_reader {
input_path: ["./my_coco_train.record-?????-of-00005"]
}
}
eval_config: {
num_examples: 8000
max_evals: 100
num_visualizations: 25
}
eval_input_reader: {
label_map_path: "./map901_label_map.pbtxt"
shuffle: false
load_instance_masks: true
mask_type: PNG_MASKS
num_readers: 1
tf_record_input_reader {
input_path: ["./my_coco_val.record-?????-of-00001"]
}
}
I ran the script with these parameters
python model_main.py --alsologtostderr \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${TRAIN_DIR} \
--num_train_steps=24000 \
--sample_1_of_n_eval_on_train_examples=25 \
--num_eval_steps=100 \
--sample_1_of_n_eval_examples=1
I think this will run an evaluation of Eval examples. To evaluate training data (to check how many features captured from training) I have added
--eval_training_data=True
to the parameters.
I can not add the "eval_training_data" on the go. I need to run 2 different training sessions.
Interestingly, with "eval_training_data" parameter added I got,
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.165
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.281
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.167
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.051
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.109
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.202
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.164
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.202
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.202
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.057
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.141
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.236
Whereas without "eval_training_data" I got
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.168
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.283
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.173
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.049
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.108
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.208
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.170
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.208
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.208
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.056
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.139
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.248
I just got confused. My questions are:
From what I could gather with a quick look at the repo was :
eval_training_data evaluates only on the training set and excludes the eval set from the evaluation process. So it's running it on your training set only.
Scores being the same is not a bad thing. It's actually good and shows that your model is not overfitted which would be the case had the evaluation of training data scores been significantly higher than the evaluation scores for the evaluation data. The higher scores in some cases for the evaluation are because it must be a much smaller data set and hence fractions can vary much more with even a few cases of good or bad predictions. Also, the model is learning the features and associating it with classes rather than learning the examples, so don't expect it to perform amazingly on the training set since it has seen all of them. The better your model performs on the validation set means the better it is generalized.
If you turn eval_training_data = True, it's actually separately evaluating the training set already and in case it's set to false( which is by default), it's evaluating only the eval set separately. I'm not sure if they have added the feature for evaluating both together, but you can do it with a very small change in the model_main.py. Just make this addition. It's not clean and optimized but I guess you see the point and can modify it accordingly.
flags.DEFINE_boolean('eval_training_data_and_eval_data', False,
'This will evaluate botht the training data and evaluation data sequentially')
if FLAGS.checkpoint_dir:
if FLAGS.eval_training_data_and_eval_data:
name = 'training_data'
input_fn = eval_on_train_input_fn
if FLAGS.run_once:
estimator.evaluate(input_fn,
steps=None,
checkpoint_path=tf.train.latest_checkpoint(
FLAGS.checkpoint_dir))
else:
model_lib.continuous_eval(estimator, FLAGS.checkpoint_dir, input_fn,
train_steps, name)
name = 'validation_data'
# The first eval input will be evaluated.
input_fn = eval_input_fns[0]
if FLAGS.run_once:
estimator.evaluate(input_fn,
steps=None,
checkpoint_path=tf.train.latest_checkpoint(
FLAGS.checkpoint_dir))
else:
model_lib.continuous_eval(estimator, FLAGS.checkpoint_dir, input_fn,
train_steps, name)
else:
train_spec, eval_specs = model_lib.create_train_and_eval_specs(
train_input_fn,
eval_input_fns,
eval_on_train_input_fn,
predict_input_fn,
train_steps,
eval_on_train_data=False)
# Currently only a single Eval Spec is allowed.
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
Also, make sure you're providing correct and distinct paths for your dataset as well. Note that if we optimized the hyperparameters based on a validation score the validation score is biased and not a good estimate of the generalization any longer. To get a proper estimate of the generalization we have to compute the score on another test set.