Search code examples
google-cloud-platformparallel-processinggcloudbazel

Unable to build in parallel using the Googles Gcloud Bazel builder


In GCP building bazel based stuff on a cloudbuild.yaml, I am using the waitFor: ['-'] keyword to build in parallel and am using the bazel gcr.io/cloud-builders/bazel -builder. When I am trying to build multiple steps with the above builder and using the waitFor: ['-'] This would create a parallel build. The issue is when trying this way, I get the error as shown below and the build fails, however when I remove the waitFor: ['-'] keyword, the building occurs sequentially and the build goes through successfully. Is there any Bazel configuration I must change in the Gcloud's Bazel builder? Error is shown below while building in parallel:

Another command holds the client lock: 
pid=12
owner=client
cwd=/workspace

Waiting for it to complete...
Another command holds the client lock: 
pid=13
owner=client
cwd=/workspace

Waiting for it to complete...
Starting local Bazel server and connecting to it...

My cloudbuild.yaml looks like this below:

steps:

-   name: "gcr.io/cloud-builders/bazel"
    id: "Building Bazel components and Uploading the component manifest for ml_cmp_1 "
    entrypoint: "bash"
    args:
        - "-c"
        - |
            cmp_dir=components/train_test_split_1
            cmp_bazel_file="$cmp_dir/BUILD.bazel"
            bazel run --remote_cache=${_BAZEL_CACHE_URL} --google_default_credentials --define=PROJ_ID=${_PROJECT_ID} //$cmp_dir:container_push

    waitFor: ["-"]

-   name: "gcr.io/cloud-builders/bazel"
    id: "Building Bazel components and Uploading the component manifest for ml_cmp_2 "
    entrypoint: "bash"
    args:
        - "-c"
        - |
            cmp_dir=components/train_test_split_2
            cmp_bazel_file="$cmp_dir/BUILD.bazel"
            bazel run --remote_cache=${_BAZEL_CACHE_URL} --google_default_credentials --define=PROJ_ID=${_PROJECT_ID} //$cmp_dir:container_push

    waitFor: ["-"]


timeout: 86399s
logsBucket: gs://some_project_id_cloudbuild/logs
options:
  machineType: 'N1_HIGHCPU_8'
  substitution_option: 'ALLOW_LOOSE'

substitutions:
    _PROJECT_ID: "some_project_id"
    _BAZEL_CACHE_URL: "https://storage.googleapis.com/some_project_id_cloudbuild/bazel-cache"

Solution

  • If you want to run multiple bazel commands in parallel, they each need their own --output_base flag. That will make each build independent.

    If you want them to share intermediate outputs within the same build machine, a shared --disk_cache is the simplest approach. You could also set up a full remote cache which would cache outputs across build machines. If you want parallel builds to deduplicate common actions fully, you'll need to set up remote execution.