I cannot start a tensorboard instance when tensorflow is already running and using the GPU. The error is below. Apparently Tensorflow blocks all GPU memory on start independent of what it actually requires. Is there a way to start tensorboard while a tensorflow process is running or does it always have be started first?
totalMemory: 5,93GiB freeMemory: 41,56MiB
2018-06-02 15:28:11.053634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-02 15:28:11.321850: E tensorflow/core/common_runtime/direct_session.cc:154] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
Traceback (most recent call last):
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/bin/tensorboard", line 11, in <module>
sys.exit(run_main())
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/main.py", line 36, in run_main
tf.app.run(main)
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/main.py", line 45, in main
default.get_assets_zip_provider())
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/program.py", line 166, in main
tb = create_tb_app(plugins, assets_zip_provider)
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/program.py", line 201, in create_tb_app
flags=FLAGS)
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/backend/application.py", line 126, in standard_tensorboard_wsgi
plugin_instances = [constructor(context) for constructor in plugins]
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/backend/application.py", line 126, in <listcomp>
plugin_instances = [constructor(context) for constructor in plugins]
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/beholder_plugin.py", line 47, in __init__
self.most_recent_frame = im_util.get_image_relative_to_script('no-data.png')
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 254, in get_image_relative_to_script
return read_image(filename)
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 242, in read_image
return np.array(decode_png(image_file.read()))
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 159, in __call__
self._lazily_initialize()
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 137, in _lazily_initialize
self._session = tf.Session(graph=graph, config=config)
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1560, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/home/pascalwhoop/Documents/Code/University/powerTAC/python-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 633, in __init__
self._session = tf_session.TF_NewSession(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
Tensorboard 1.7.0 seems to occupy about 150MB on the GPU. See this open Tensorboard issue. Looks like it's in the process of being resolved.
An option in the interim is to limit the percentage of memory Tensorflow is allowed to allocate per process upfront as detailed in this answer. This way you can ensure a certain percentage of memory is reserved for other tasks on the GPU that you might want to run during training.
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))