Search code examples
google-cloud-platformgoogle-cloud-vertex-ai

How do parallel trials in GCP Vertex AI work?


When you make a hyperparameter tuning job, you can specify the number of trials to run in parallel. After that, you also select the type and count of the workers. What I don't understand is when I make two or more trials run in parallel, yet only one worker, each task is said to occupy 100% of the CPU. However, if one task occupies all of the CPU's resources, how can 2 of them run in parallel? Does GCP provision more than 1 machine?


Solution

  • Parallel trials allows you to run the trials concurrently depending on your input on the maximum number of trials.

    You are correct with your statement "one worker, each task is said to occupy 100% of the CPU" and for GCP to run other tasks in parallel,

    the hyperparameter tuning service provisions multiple training processing clusters (or multiple individual machines in the case of a single-process trainer). The work pool spec that you set for your job is used for each individual training cluster.

    Please see Parallel Trials Documentation for more details.

    And for more details about Hyperparameter Tuning, you may refer to Hyperparameter Tuning Documentation.