I have a workflow with cloud run job creation, i want to execute the job at some step, but not waiting until it completes. The job is meant to be long running job. But i dont want it to block my workflow. I have parallel steps but this path of running the batch job will still waits (my cloud run job is setting for 7 days timeout)
Is there any workaround or method?
i have considered cloud task / pub/sub trigger (not working cause cloud run job has no http endpoint)
now my workflow has the step to googleapis.run.v1.namespaces.jobs.create
, but after creating cloud run job it's not executing automatically. (i cant use googleapis.run.v1.namespaces.jobs.run because, again, that will be somewhat like 'wait until done or timeout' and workflow will have to keeps running)
This is kind of frustrating moments without any available solution via any of the gcp services.
part of my workflow is as below to show the parallel steps
- parallel_steps:
parallel:
branches:
- start_monitor:
steps:
- monitor_with_error_handling:
try:
steps:
- start_monitor_job:
call: googleapis.run.v1.namespaces.jobs.create
args:
parent: $${"namespaces/" + gcp_project_id}
location: $${gcp_region}
body:
apiVersion: "run.googleapis.com/v1"
kind: "Job"
metadata:
name: $${"monitor-" + "batchjob_", "")}
annotations:
run.googleapis.com/launch-stage: BETA
spec:
template:
spec:
template:
spec:
containers:
- image: $${gcp_region + "-docker.pkg.dev/" + gcp_project_id + "/repo-name/repo-image:" + env_name}
env: ... some env vars.
timeoutSeconds: 172800
result: monitor_job_result
- log_monitor_job_start:
call: sys.log
args:
text: '$${"Started monitor job: " + json.encode_to_string(monitor_job_result)}'
severity: INFO
except:
as: e
steps:
- log_monitor_job_error:
call: sys.log
args:
severity: ERROR
text: '$${"Monitor job failed: " + json.encode_to_string(e)}'
- continue_workflow:
steps:
- pass:
assign:
- _: null
... subsequent steps
Update to show what i mean - so with the parallel steps, if i just execute the job with
- execute_monitor:
call: googleapis.run.v1.namespaces.jobs.run
args:
name: $${"namespaces/" + gcp_project_id + "/jobs/job-name-" + env_name}
location: $${gcp_region}
body:
the job will be executed, and the parallel step will wait till the job is done (which might be super long running. but at the same time i want to continue with other steps in the workflow.
I endeded up using a workaround. This is because of the workflow client libraries limitation. It does not support all capabilities on par with gcloud for example. Hence nothing much we can customize with the configuration.
To simplify, we have to use
https://cloud.google.com/workflows/docs/samples/workflows-cloud-build-run-gcloud
Example
- execute_monitor_job_async:
call: gcloud
args:
args: $${"run jobs deploy " + monitor_job + " --image " + monitor_job_image + " --region " + gcp_region + " --update-env-vars " + full_env_vars + " --task-timeout 86400s --execute-now"}
and defining gcloud:
gcloud:
params: [args]
steps:
- create_build:
call: googleapis.cloudbuild.v1.projects.builds.create
args:
projectId: $${sys.get_env("gcp_project_id")}
parent: $${"projects/" + sys.get_env("gcp_project_id") + "/locations/global"}
body:
serviceAccount: $${"projects/" + sys.get_env("gcp_project_id") + "/serviceAccounts/" + sys.get_env("cloud_build_sa")}
options:
logging: CLOUD_LOGGING_ONLY
steps:
- name: gcr.io/google.com/cloudsdktool/cloud-sdk
entrypoint: /bin/bash
args: $${["-c", "gcloud " + args + " > $$BUILDER_OUTPUT/output"]}
result: result_builds_create
- return_build_result:
return: $${text.split(text.decode(base64.decode(result_builds_create.metadata.build.results.buildStepOutputs[0])), "\n")}
(Note: the "full_env_vars" details are omitted here, it's just various env vars i need for my run jobs to pass in)
The result: It will not block the workflow steps anymore. It will proceed to next step
If anyone want to read more there's a good explanation article by Mark - https://medium.com/@markwkiehl/google-cloud-run-jobs-scheduler-22a4e9252cf0