I was trying using tensorboard along with tensorflow, and I did this as a set up:
rand = tf.placeholder(dtype=tf.float32) # this will be visualised in tensorboard later on
tf.summary.image('random_noise_visualisation', rand,max_outputs=5)
merged_summary_op = tf.summary.merge_all() # to me this seems like a helper to
# merge all tensorboard related operations
Then I evaluate my merged_summary_op
and feed it a very large array, about 1 GB in size.
It does not seem to use any extra GPU memory from the memory that was already being used.
I also just tried to evaluate my rand
placeholder, thinking maybe summary ops have special handling to prevent data from going to the GPU. I did:
random_value = np.random.randn(3000,224,224,1)
sess.run(rand,feed_dict={rand:random_value})
There was again, no extra GPU utilisation.
However, when I do
sess.run(rand + 2 ,feed_dict={rand:random_value}) # forced to do some calculation
There is an additional GPU utilisation, with an increase of about 1 GB.
for all the above experiments, I used my session as:
sess = tf.InteractiveSession(graph=tf.Graph())
My questions are:
Does Tensorflow know when to not bother to send a Tensor to the GPU ?
Yes.
In fact, in your first rand
experiment tensorflow figured out not to bother any device, because the supplied fetch rand
is already in feed_dict
. This fairly simple optimization can be seen in session.py
:
self._final_fetches = [x for x in self._fetches if x not in feeds]
... and later on in the same file:
# We only want to really perform the run if fetches or targets are provided,
# or if the call is a partial run that specifies feeds.
if final_fetches or final_targets or (handle and feed_dict_tensor):
results = self._do_run(handle, final_targets, final_fetches,
feed_dict_tensor, options, run_metadata)
else:
results = []
The second experiment doesn't fall into this optimization, so the graph is truly evaluated. Tensorflow pinned the placeholder to the available GPU, consequently the addition as well, which explains GPU utilization.
This can be seen vividly if run the session with log_device_placement=True
:
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
random_value = np.random.randn(300,224,224,1)
print(sess.run(rand + 2, feed_dict={rand: random_value}).shape)
As far as image summary op is concerned, it is indeed special: ImageSummary
op doesn't have GPU implementation. Here's the source code (core/kernels/summary_image_op.cc
):
REGISTER_KERNEL_BUILDER(Name("ImageSummary").Device(DEVICE_CPU),
SummaryImageOp);
Hence, if you try to place it to CPU manually, session.run()
will throw an error:
# THIS FAILS!
with tf.device('/gpu:0'):
tf.summary.image('random_noise_visualisation', rand,max_outputs=5)
merged_summary_op = tf.summary.merge_all() # to me this seems like a helper to
# merge all tensorboard related operations
This seems reasonable, since summary ops don't perform any complex calculations and mostly deal with disk I/O.
ImageSummary
isn't the only CPU op, e.g., all summary ops are. There is a related GitHub issue, but currently there's no better way to check if a particular op is supported in GPU, other that check the source code.
In general, tensorflow tries to utilize as much available resources as possible, so when the GPU placement is possible and no other restrictions apply, the engine tends to choose GPU over CPU.
Will changing from Interactive session to a normal session affect this behaviour ?
No. InteractiveSession
doesn't affect device placement logic. The only big difference is that InteractiveSession
makes itself a default session upon creation, while Session
is default only within with
block.
Is there any particular documentation for this ?
I'm afraid to be wrong here, but likely not. For me the best source of truth is the source code.