TensorFlow 2.12 GPU Utilisation with CUDA 11.8 - Unsuccessful GPU utilisation during training even though it shown to be available

Update: Apparently the GPU was being used, only at 30% of its max capability for some reason, assuming its due to the simple network's complexity and batch sizes.

Fluctuations during GPU Performance while the code is running.

I'm using TensorFlow 2.12, with CUDA 11.8 and cudNN 8.6, I have installed the packages as indicated in their respective documentations (TensorFlow & CUDA). I managed to successfully utilise the GPU during testing of the package within my .py file using:

in: print(tf.config.list_physical_devices())

out: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Although, I cannot get my training to be run with the GPU, I am sure that I am missing a significant point in activating it, such as how one can transfer the data running through the model from CPU to GPU using ".to(device)" function in PyTorch.

I have looked up at the documentation, although could not spot anything.

Your help is greatly appreciated, thanks!

Solution

The TensorFlow counterpart for the PyTorch operation you have mentioned and for your use here, should be tf.device.

In general, TensorFlow will automatically add the operations to the available devices according to the function being used.

As you have rightly mentioned, with tf.config.list_physical_devices() you'll be able to see the devices your runtime is using (CPU/GPU/TPU).

With tf.debugging.set_log_device_placement(True) you can find out to which devices your operations and tensors are assigned.

import tensorflow as tf
print(tf.config.list_physical_devices())

Output:

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

For manually assigning the CPU,

tf.debugging.set_log_device_placement(True)  

with tf.device('/CPU:0'): 

  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) 

  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

Output:

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:CPU:0

For using the same with GPU,

with tf.device('/GPU:0'): 

  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) 

  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]

Output:

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0

For more information on manual device placement, please refer to this document.