How to use evaluation_loop with train_loop in tf-slim

I'm trying to implement a few different models and train them on CIFAR-10, and I want to use TF-slim to do this. It looks like TF-slim has two main loops that are useful during training: train_loop and evaluation_loop.

My question is: what is the canonical way to use these loops? As a followup: is it possible to use early stopping with train_loop?

Currently I have a model and my training file train.py looks like this

import ...
train_log_dir = ...

with tf.device("/cpu:0"):
  images, labels, dataset = set_up_input_pipeline_with_fancy_prefetching( 
                                                                subset='train', ... )
logits, end_points = set_up_model( images ) // Possibly using many GPUs
total_loss = set_up_loss( logits, labels, dataset )
optimizer, global_step = set_up_optimizer( dataset )
train_tensor = slim.learning.create_train_op( 
                                      total_loss, 
                                      optimizer,
                                      global_step=global_step,
                                      clip_gradient_norm=FLAGS.clip_gradient_norm,
                                      summarize_gradients=True)
slim.learning.train(train_tensor, 
                      logdir=train_log_dir,
                      local_init_op=tf.initialize_local_variables(),
                      save_summaries_secs=FLAGS.save_summaries_secs,
                      save_interval_secs=FLAGS.save_interval_secs)

Which is awesome so far - my models all train and converge nicely. I can see this from the events in train_log_dir where all the metrics are going in the right direction. And going in the right direction makes me happy.

But I'd like to check that the metrics are improving on the validation set, too. I don't know of any way to do with TF-slim in a way that plays nicely with the training loop, so I created a second file called eval.py which contains my evaluation loop.

import ...
train_log_dir = ...

with tf.device("/cpu:0"):
  images, labels, dataset = set_up_input_pipeline_with_fancy_prefetching( 
                                                                subset='validation', ... )
logits, end_points = set_up_model( images )
summary_ops, names_to_values, names_to_updates = create_metrics_and_summary_ops( 
                                                                logits,
                                                                labels,
                                                                dataset.num_classes() )

slim.get_or_create_global_step()
slim.evaluation.evaluation_loop(
      '',
      checkpoint_dir=train_log_dir,
      logdir=train_log_dir,
      num_evals=FLAGS.num_eval_batches,
      eval_op=names_to_updates.values(),
      summary_op=tf.merge_summary(summary_ops),
      eval_interval_secs=FLAGS.eval_interval_secs,
      session_config=config)

Questions:

1) I currently have this model for the evaluation_loop hogging up an entire GPU, but it's rarely being used. I assume there's a better way to allocate resources. It would be pretty nice if I could use the same evaluation_loop to monitor the progress of multiple different models (checkpoints in multiple directories). Is something like this possible?

2) There's no feedback between the evaluation and training. I'm training a ton of models and would love to use early stopping to halt the models which aren't learning or are not converging. Is there a way to do this? Ideally using information from the validation set, but if it has to be just based on the training data that's okay, too.

3) Is my workflow all wrong and I should be structuring it differently? It's not clear from the documentation how to use evaluation in conjunction with training.

Update ~~It seems that as of TF r0.11 I'm also getting a segfault when calling slim.evaluation.evaluation_loop. It only happens sometimes (for me when I dispatch my jobs to a cluster). It happens in sv.managed_session--specifically prepare_or_wait_for_session.~~ This was just due to evaluation loop (a second instance of tensorflow) trying to use the GPU, which was already requisitioned by the first instance.

Solution

evaluation_loop is meant to be used (as you are currently using it) with a single directory. If you want to be more efficient, you could use slim.evaluation.evaluate_once and add the appropriate logic for swapping directories as you find appropriate.
You can do this by overriding the slim.learning.train(..., train_step_fn) argument. This argument replaces the 'train_step' function with a custom function. Here, you can supply custom training function which returns the 'total_loss' and 'should_stop' values as you see fit.
Your workflow looks great, this is probably the most common workflow for learning/eval using TF-Slim.