Search code examples
tensorflowgoogle-cloud-platformgoogle-cloud-ml-engine

Google Cloud ML Engine GPU Utilization


If I am using --scale-tier BASIC GPU within a Google Cloud ML Engine job, how can I view the GPU utilization? I am able to view CPU Utilization and Memory utilization on the "job details" tab, but I'm wondering how much the GPU is being utilized. Is this just contained within CPU usage or is there another tab to look at GPU utilization?

Additionally, are there any ways to view which ops are taking up most of the CPU usage? My CPU utilization is very high, my memory is very low and my input producer is always full (100%) so I'm trying to get a better understanding of where the time is being spent so that I can try to optimize my model performance.


Solution

  • There is currently no way to see GPU utilization with Cloud ML Engine.

    TensorFlow has a feature called timelines which can be used to obtain profile data. Here's a blog post describing how to use it.