Search code examples
pythonoptimizationtensorflowgputensorboard

Does Tensorflow "know" when not to put data in GPU?


I was trying using tensorboard along with tensorflow, and I did this as a set up:

rand = tf.placeholder(dtype=tf.float32)    # this will be visualised in tensorboard later on 
tf.summary.image('random_noise_visualisation', rand,max_outputs=5)
merged_summary_op = tf.summary.merge_all() # to me this seems like a helper to 
                                           # merge all tensorboard related operations

Then I evaluate my merged_summary_op and feed it a very large array, about 1 GB in size.

It does not seem to use any extra GPU memory from the memory that was already being used.

I also just tried to evaluate my rand placeholder, thinking maybe summary ops have special handling to prevent data from going to the GPU. I did:

random_value = np.random.randn(3000,224,224,1)
sess.run(rand,feed_dict={rand:random_value})

There was again, no extra GPU utilisation.

However, when I do

sess.run(rand + 2 ,feed_dict={rand:random_value}) # forced to do some calculation

There is an additional GPU utilisation, with an increase of about 1 GB.

for all the above experiments, I used my session as:

sess = tf.InteractiveSession(graph=tf.Graph())

My questions are:

  • Does Tensorflow know when to not bother to send a Tensor to the GPU ?
  • Will changing from Interactive session to a normal session affect this behaviour ?
  • Is there any particular documentation for this ?

Solution

  • Does Tensorflow know when to not bother to send a Tensor to the GPU ?

    Yes.

    In fact, in your first rand experiment tensorflow figured out not to bother any device, because the supplied fetch rand is already in feed_dict. This fairly simple optimization can be seen in session.py:

    self._final_fetches = [x for x in self._fetches if x not in feeds]
    

    ... and later on in the same file:

    # We only want to really perform the run if fetches or targets are provided,
    # or if the call is a partial run that specifies feeds.
    if final_fetches or final_targets or (handle and feed_dict_tensor):
      results = self._do_run(handle, final_targets, final_fetches,
                             feed_dict_tensor, options, run_metadata)
    else:
      results = []
    

    The second experiment doesn't fall into this optimization, so the graph is truly evaluated. Tensorflow pinned the placeholder to the available GPU, consequently the addition as well, which explains GPU utilization.

    This can be seen vividly if run the session with log_device_placement=True:

    with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
      random_value = np.random.randn(300,224,224,1)
      print(sess.run(rand + 2, feed_dict={rand: random_value}).shape)
    

    As far as image summary op is concerned, it is indeed special: ImageSummary op doesn't have GPU implementation. Here's the source code (core/kernels/summary_image_op.cc):

    REGISTER_KERNEL_BUILDER(Name("ImageSummary").Device(DEVICE_CPU),
                            SummaryImageOp);
    

    Hence, if you try to place it to CPU manually, session.run() will throw an error:

    # THIS FAILS!
    with tf.device('/gpu:0'):
      tf.summary.image('random_noise_visualisation', rand,max_outputs=5)
      merged_summary_op = tf.summary.merge_all() # to me this seems like a helper to
                                                 # merge all tensorboard related operations
    

    This seems reasonable, since summary ops don't perform any complex calculations and mostly deal with disk I/O.

    ImageSummary isn't the only CPU op, e.g., all summary ops are. There is a related GitHub issue, but currently there's no better way to check if a particular op is supported in GPU, other that check the source code.

    In general, tensorflow tries to utilize as much available resources as possible, so when the GPU placement is possible and no other restrictions apply, the engine tends to choose GPU over CPU.

    Will changing from Interactive session to a normal session affect this behaviour ?

    No. InteractiveSession doesn't affect device placement logic. The only big difference is that InteractiveSession makes itself a default session upon creation, while Session is default only within with block.

    Is there any particular documentation for this ?

    I'm afraid to be wrong here, but likely not. For me the best source of truth is the source code.