Search code examples
tensorflowgpugoogle-dl-platform

Why doesn't tensorflow on google deep learning VM use GPU?


I am using a google deep learning VM from google marketplace and I opted for a NvdiaK80 GPU. I am trying to train an object detection model using object detection API. However, I notice that tensorflow is not using GPU by default(code to check is below)

My assumption here is that this instance comes with all the required NVIDIA drivers so it's not a driver related problem.

Further investigation showed that I had 2 installations of Tensorflow (tensorflow 1.12.0 and tensorflow-GPU 1.12.0). So I uninstalled the CPU version. However it still does not help.

I used the code below to check if tensorflow is using GPU

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

For reference, I am using the below code for object detection training which is running fine on the deep learning VM but is not using GPU.

python $Tensor_path/legacy/train.py --logtostderr -- 
train_dir=$Train_path/training/ -- 
pipeline_config_path=$Train_path/training/
ssd_inception_v2_pets.config

Output(I would have expect the GPU device specifics that is being used)

[name: "/cpu:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 18292259467280600161
]

Solution

  • I was able to resolve this by deleting the old instance and starting fresh with a new instance. My guess is the tensorflow GPU installation got corrupted while installing object detection API. Followed the steps here to install https://cloud.google.com/solutions/creating-object-detection-application-tensorflow

    And most likely this line is the culprit

    pip install --upgrade 
    https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp27-none- 
    linux_x86_64.whl