I currently use Vertex AI Custom training where:
However when doing so, I notice that there are bouts of 0 utilisation on my GPU (whereas my GPU memory is constantly at ~ 80%). I presume that's because of I/O bottlenecks because it's piping data from a remote GCS bucket.
What's the most efficient way of loading data into my training application? Would it be to download my data into my training container than load data locally, rather than piping it from a GCS bucket?
I found this blog post from GCP that answers the question:
https://cloud.google.com/blog/products/ai-machine-learning/efficient-pytorch-training-with-vertex-ai
TL;DR - use torchdata.datapipes.iter.WebDataset
.