Search code examples
google-cloud-platformgoogle-workflowsgoogle-cloud-run-jobs

Workflow to run cloud run job, while not waiting for completion and proceeds to next step


I have a workflow with cloud run job creation, i want to execute the job at some step, but not waiting until it completes. The job is meant to be long running job. But i dont want it to block my workflow. I have parallel steps but this path of running the batch job will still waits (my cloud run job is setting for 7 days timeout)

Is there any workaround or method?

i have considered cloud task / pub/sub trigger (not working cause cloud run job has no http endpoint)

now my workflow has the step to googleapis.run.v1.namespaces.jobs.create, but after creating cloud run job it's not executing automatically. (i cant use googleapis.run.v1.namespaces.jobs.run because, again, that will be somewhat like 'wait until done or timeout' and workflow will have to keeps running)

This is kind of frustrating moments without any available solution via any of the gcp services.

part of my workflow is as below to show the parallel steps

- parallel_steps:
                  parallel:
                    branches:
                      - start_monitor:
                          steps:
                            - monitor_with_error_handling:
                                try:
                                  steps:
                                    - start_monitor_job:
                                        call: googleapis.run.v1.namespaces.jobs.create
                                        args:
                                          parent: $${"namespaces/" + gcp_project_id}
                                          location: $${gcp_region}
                                          body:
                                            apiVersion: "run.googleapis.com/v1"
                                            kind: "Job"
                                            metadata:
                                              name: $${"monitor-" + "batchjob_", "")}
                                              annotations:
                                                run.googleapis.com/launch-stage: BETA
                                            spec:
                                              template:
                                                spec:
                                                  template:
                                                    spec:
                                                      containers:
                                                        - image: $${gcp_region + "-docker.pkg.dev/" + gcp_project_id + "/repo-name/repo-image:" + env_name}
                                                          env: ... some env vars.
                                                      timeoutSeconds: 172800
                                        result: monitor_job_result

                                    - log_monitor_job_start:
                                        call: sys.log
                                        args:
                                          text: '$${"Started monitor job: " + json.encode_to_string(monitor_job_result)}'
                                          severity: INFO

                                except:
                                  as: e
                                  steps:
                                    - log_monitor_job_error:
                                        call: sys.log
                                        args:
                                          severity: ERROR
                                          text: '$${"Monitor job failed: " + json.encode_to_string(e)}'

                      - continue_workflow:
                          steps:
                            - pass:
                                assign:
                                  - _: null

... subsequent steps

Update to show what i mean - so with the parallel steps, if i just execute the job with

- execute_monitor:
                                        call: googleapis.run.v1.namespaces.jobs.run
                                        args:
                                          name: $${"namespaces/" + gcp_project_id + "/jobs/job-name-" + env_name}
                                          location: $${gcp_region}
                                          body:

the job will be executed, and the parallel step will wait till the job is done (which might be super long running. but at the same time i want to continue with other steps in the workflow.

enter image description here


Solution

  • I endeded up using a workaround. This is because of the workflow client libraries limitation. It does not support all capabilities on par with gcloud for example. Hence nothing much we can customize with the configuration.

    To simplify, we have to use

    1. gcloud command in workflow (i use "run cloud deploy" for my use case specifically due to some of my requirements, but "run jobs create" could work as well
    2. leverage the gcloud flag options that is to "--execute-now"

    https://cloud.google.com/workflows/docs/samples/workflows-cloud-build-run-gcloud

    Example

    - execute_monitor_job_async:
        call: gcloud
        args:
          args: $${"run jobs deploy " + monitor_job + " --image " + monitor_job_image + " --region " + gcp_region + " --update-env-vars " + full_env_vars + " --task-timeout 86400s --execute-now"}
    

    and defining gcloud:

    gcloud:
        params: [args]
        steps:
        - create_build:
            call: googleapis.cloudbuild.v1.projects.builds.create
            args:
              projectId: $${sys.get_env("gcp_project_id")}
              parent: $${"projects/" + sys.get_env("gcp_project_id") + "/locations/global"}
              body:
                serviceAccount: $${"projects/" + sys.get_env("gcp_project_id") + "/serviceAccounts/" + sys.get_env("cloud_build_sa")}
                options:
                  logging: CLOUD_LOGGING_ONLY
                steps:
                - name: gcr.io/google.com/cloudsdktool/cloud-sdk
                  entrypoint: /bin/bash
                  args: $${["-c", "gcloud " + args + " > $$BUILDER_OUTPUT/output"]}
            result: result_builds_create
        - return_build_result:
            return: $${text.split(text.decode(base64.decode(result_builds_create.metadata.build.results.buildStepOutputs[0])), "\n")}
    

    (Note: the "full_env_vars" details are omitted here, it's just various env vars i need for my run jobs to pass in)

    The result: It will not block the workflow steps anymore. It will proceed to next step

    If anyone want to read more there's a good explanation article by Mark - https://medium.com/@markwkiehl/google-cloud-run-jobs-scheduler-22a4e9252cf0