Search code examples
python-2.7tensorflowtf-slim

tensorflow slim concurrent train and evaluation loops; single device


I am interested in using the tensorflow slim library (tf.contrib.slim) to do evaluation of a model performance on a(n) (entire) test set periodically during training. The documentation is pretty clear that slim.evaluation.evaluation_loop is the way to go, and it looks promising. The issue is that I don't have a second gpu to spare, this model parameters take up an entire gpu's worth of memory, and I would like to do concurrent evaluation.

For example, if I had 2 GPUs, I could run a python script that terminated with "slim.learning.train()" on the first gpu, and another that terminated with "slim.evaluation.evaluation_loop()" on the second gpu.

Is there an approach that can manage 1 gpu's resources for both tasks? tf.train.Supervisor comes to mind, but I don't honestly know.


Solution

  • You can partition the GPU usage using the following code.

    You can set the fraction of the GPU to be used for training and evaluation separately. The code below means that the process is given 30% of the memory. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3000) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) sess.run(tf.app.run())