Search code examples
exceptiongoogle-cloud-platformterraformgcloud

Is there any way to catch errors in Google Cloud Build?


I am trying to deploy resources to Google Cloud Platform.

Most of my CI pipeline uses terraform, but I am trying to use something that only works using the gcloud CLI tool (or, presumably, a REST API call, but I haven't tested that).

The specific thing I am trying to solve is to bind a policy tag to a BigQuery dataset that is multi-regional ('US'). There is no way to do this in Terraform (as of summer, 2024). In fact, it cannot even be done with normal gcloud, but it can be done with gcloud alpha.

My command is the following:

gcloud alpha resource-manager tags bindings create \
                --tag-value=$tag_value \  # of the form "tagValues/12345678"
                --parent=$parent \        # the bq dataset, of the form "//bigquery.googleapis.com/projects/<project>/datasets/<dataset>"
                --location=US

This command works. However, if the tag already exists gcloud throws an error:

ERROR: (gcloud.alpha.resource-manager.tags.bindings.create) ALREADY_EXISTS: A binding already exists between the given resource and TagValue."

This is a problem, because it means that the entire build, when it is merged into our main branch, will fail any time the tag is already there. Since I'll build it while updating the PR, it's nearly always failing, but with a false fail.

So I would like to catch any error that says ALREADY_EXISTS and let that fail silently, since to me that is not really an error.

If it helps, here is the cloud build script:

# This pipeline is adding infrastructure defined in /2-terraform
steps:
  - id: 'terraform init'
    name: 'hashicorp/terraform:1.7.5'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
          cd infrastructure/2-insights-infra/2-terraform
          terraform init

  - id: 'terraform plan'
    name: 'hashicorp/terraform:1.7.5'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
          cd infrastructure/2-insights-infra/2-terraform
          terraform plan

  - id: 'terraform apply'
    name: 'hashicorp/terraform:1.7.5'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
          cd infrastructure/2-insights-infra/2-terraform
          terraform apply -auto-approve

  - id: 'capture terraform output for next step'
    name: 'hashicorp/terraform:1.7.5'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
          cd infrastructure/2-insights-infra/2-terraform
          echo "$(terraform output project | tr -d '"')" > /workspace/project.txt
          echo "$(terraform output -json tag_values_by_dataset)" > /workspace/tag_values_by_dataset.json
          echo "$(terraform output tagged_datasets | tr -d '",][')" > /workspace/datasets.txt

  - id: 'bind policy tags to resources'
    name: 'gcr.io/cloud-builders/gcloud'
    entrypoint: bash
    args:
      - '-c'
      - |
          project=$(cat /workspace/project.txt)
          tag_values_by_dataset=$(cat /workspace/tag_values_by_dataset.json)
          datasets=$(cat /workspace/datasets.txt)
          for dataset in $datasets; do
            parent=$(echo "//bigquery.googleapis.com/projects/$project/datasets/$dataset")
            tag_values=$(echo "$tag_values_by_dataset" | sed -r "s|.*$dataset\":[^:]*:\"tagValues([^\"]+)\".*|tagValues\1|")
            for tag_value in $tag_values; do
              gcloud alpha resource-manager tags bindings create \
                --tag-value=$tag_value \
                --parent=$parent \
                --location=$_LOCATION
            done
          done

substitutions:
  _LOCATION: US

This is pretty ugly, and I'm not happy about using the gcloud command, but I have no idea when Terraform might add the capability, so it's the best I have, for now. However, it would at least be acceptable if I could get it to not generate an error when it is actually working.


Solution

  • Thanks to @DazWilkin , I resolved this by writing it in Python. It's now working, so I'm sharing the Python step of the solution (the other steps are the same as shown above):

      ### Policy tags are being bound using `gcloud` for now, because Terraform does not
      ### (yet) support binding policy tags to multi-region resources.
      - id: 'bind policy tags to resources'
        name: 'gcr.io/cloud-builders/gcloud'
        entrypoint: python3
        args:
          - '-c'
          - |
              from json import loads
              from subprocess import run, PIPE, CalledProcessError
              with open("/workspace/project.txt") as fh:
                project = fh.read().strip() # For some reason, `json` is not acting normal
              with open("/workspace/tag_values_by_dataset.json") as fh:
                tag_values_by_dataset = loads(fh.read().strip()) # again, `json` acting oddly
              for dataset, tags in tag_values_by_dataset.items():
                for tag_value in tags.values():
                  parent = f"//bigquery.googleapis.com/projects/{project}/datasets/{dataset}"
                  print(f"Attempt to bind tag {tag_value} to resource {parent} in $_LOCATION :")
                  ### Should be possible to use resourcemanager_v3.TagBindingsClient.create_tag_binding() here
                  try:
                    result = run(
                      ["gcloud", "alpha", "resource-manager", "tags", "bindings", "create",
                      "--tag-value", tag_value, "--parent", parent, "--location",
                      "$_LOCATION"],
                      stdout=PIPE, stderr=PIPE, check=True, text=True)
                    print(f"Tag {tag_value} successfully bound to {parent} in $_LOCATION")
                    print(f"\t{result}")
                  except CalledProcessError as err:
                    if "EXISTING_BINDING" in err.stderr:
                      print(f"Tag {tag_value} already bound to {parent} in $_LOCATION")
                    else:
                      raise err
    

    The $_LOCATION is a cloud build substitution:

    substitutions:
      _LOCATION: US
    

    And I used "EXISTING_BINDING" as a search term instead of "ALREADY_EXISTS", because "ALREADY_EXISTS" isn't an error type or anything useful like that... it is just text. The full error code also includes the text "EXISTING_BINDING", which I feel more precisely captures the error and therefore minimizes accidental capture of other errors.