Advice on when to stop training a resnet50 network (mxnet on aws)

I have a custom dataset of approximately 20k images (10% used of validation). I have roughly 1/3 in label class 0, 1/3 in label class 1, and 1/3 that do not have class 0, or 1 objects with a -1 label.

I have run approximately 400 epochs, the last 40 epochs validation mAP has increased from 0.817 TO 0.831, and training cross entropy loss from 0.377->0.356

the last epoch had validation mAP <score>=(0.83138943309)
train cross_entropy <loss>=(0.356147519184)
train smooth_l1 <loss>=(0.150637295831)

The training loss still seems like its got a reasonable amount to reduce but I don't have any experience with resnet (on yolov3 this data set quickly went below .1)
Is my approach of have 1/3 of the training images not have either class present reasonable? When I was doing yolov3 training it seemed to help the network avoid false positives.
Is there any rule of thumb that helps me estimate how many epochs are appropriate based on the number of classes/images?
Its cost me about 100 bucks on aws to get to this point, I'm not sure if it needs another 100 bucks or 1000 bucks to get to the optimal mAP - at the current rate it appears 1 hour is making about 1% improvement - and i'd expect that to slow down.
Are there other metrics I should be looking at? (if so how do i export them)?
are there any hyperparameters I should change, and resume training?

My hyperparameters are:

base_network='resnet-50',
num_classes=2,
mini_batch_size=32,
epochs=200,
learning_rate=0.001,
lr_scheduler_step='3,6',
lr_scheduler_factor=0.1,
optimizer='sgd',
momentum=0.9,
weight_decay=0.0005,
overlap_threshold=0.5,
nms_threshold=0.45,
image_shape=416,
label_width=480,
num_training_samples=19732)

thanks, John

Solution

It's hard to say ahead of time for a custom dataset because you're dealing with many different variables. Tracking the validation mAP is of course a good way to tell you when to stop. For example, the mAP stops increasing, or mAP is leveling out.

So beyond that, I would recommend looking at others who used the same architecture and similar parameters to gain an insight. You mentioned a custom dataset, but for ImageNet, DAWNBench publishes that info. For example, this page lists the hyperparameters per epoch for you to explore of a related setup.

I would also urge you to look at fine tuning pre-trained models to save money and computation. See the Vision section here and here and https://github.com/apache/incubator-mxnet/issues/4616 for information on fine-tuning the FC layers.