By "Google Batch" I'm referring to the new service Google launched about a month or so ago.
https://cloud.google.com/batch
I have a Python script which takes a few minutes to execute at the moment. However with the data it will soon be processing in the next few months this execution time will go from minutes to hours. This is why I am not using Cloud Function or Cloud Run to run this script, both of these have a max 60 minute execution time.
Google Batch came about recently and I wanted to explore this as a possible method to achieve what I'm looking for without just using Compute Engine.
However documentation is sparse across the internet and I can't find a method to "trigger" an already created Batch job by using Cloud Scheduler. I've already successfully manually created a batch job which runs my docker image. Now I need something to trigger this batch job 1x a day, thats it. It would be wonderful if Cloud Scheduler could serve this purpose.
I've seen 1 article describing using GCP Workflow to create a a new Batch job on a cron determined by Cloud Scheduler. Issue with this is its creating a new batch job every time, not simply re-running the already existing one. To be honest I can't even re-run an already executed batch job on the GCP website itself so I don't know if its even possible.
https://www.intertec.io/resource/python-script-on-gcp-batch
Lastly, I've even explored the official Google Batch Python library and could not find anywhere in there some built in function which allows me to "call" a previously created batch job and just re-run it.
I wrote this for you this morning as a guide.
It uses Google's example in combination with Cloud Scheduler:
# Used to correctly (!?) form Batch Job
import google.cloud.batch_v1.types
import google.cloud.scheduler_v1
import google.cloud.scheduler_v1.types
import os
project = os.getenv("PROJECT")
number = os.getenv("NUMBER")
location = os.getenv("LOCATION")
job = os.getenv("JOB")
# Batch Job
# Create Batch Job using batch_v1.types
# Alternatively, create this from scratch
batch_job = google.cloud.batch_v1.types.Job(
priority=0,
task_groups=[
google.cloud.batch_v1.types.TaskGroup(
task_spec=google.cloud.batch_v1.types.TaskSpec(
runnables=[
google.cloud.batch_v1.types.Runnable(
container=google.cloud.batch_v1.types.Runnable.Container(
image_uri="gcr.io/google-containers/busybox",
entrypoint="/bin/sh",
commands=[
"-c",
"echo \"Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks.\""
],
),
),
],
compute_resource=google.cloud.batch_v1.types.ComputeResource(
cpu_milli=2000,
memory_mib=16,
)
),
task_count=1,
parallelism=1,
),
],
allocation_policy=google.cloud.batch_v1.types.AllocationPolicy(
location=google.cloud.batch_v1.types.AllocationPolicy.LocationPolicy(
allowed_locations=[
f"regions/{location}",
],
),
instances=[
google.cloud.batch_v1.types.AllocationPolicy.InstancePolicyOrTemplate(
policy=google.cloud.batch_v1.types.AllocationPolicy.InstancePolicy(
machine_type="e2-standard-2",
),
),
],
),
labels={
"stackoverflow":"73966292",
},
logs_policy=google.cloud.batch_v1.types.LogsPolicy(
destination=google.cloud.batch_v1.types.LogsPolicy.Destination.CLOUD_LOGGING,
),
)
# Convert the Google Batch Job into JSON
# Google uses Proto Python
# https://proto-plus-python.readthedocs.io/en/stable/messages.html?highlight=JSON#serialization
batch_json=google.cloud.batch_v1.types.Job.to_json(batch_job)
print(batch_json)
# Convert JSON to bytes as required for body by Cloud Scheduler
body=batch_json.encode("utf-8")
# Run hourly on the hour (HH:00)
schedule = "0 * * * *"
parent = f"projects/{project}/locations/{location}"
name = f"{parent}/jobs/{job}"
uri = f"https://batch.googleapis.com/v1/{parent}/jobs?job_id={job}"
service_account_email = f"{number}-compute@developer.gserviceaccount.com"
scheduler_job = google.cloud.scheduler_v1.types.Job(
name=name,
description="description",
http_target=google.cloud.scheduler_v1.types.HttpTarget(
uri=uri,
http_method=google.cloud.scheduler_v1.types.HttpMethod(
google.cloud.scheduler_v1.types.HttpMethod.POST,
),
oauth_token=google.cloud.scheduler_v1.types.OAuthToken(
service_account_email=service_account_email,
),
body=body,
),
schedule=schedule,
)
scheduler_json=google.cloud.scheduler_v1.Job.to_json(scheduler_job)
print(scheduler_job)
request = google.cloud.scheduler_v1.CreateJobRequest(
parent=parent,
job=scheduler_job,
)
scheduler_client = google.cloud.scheduler_v1.CloudSchedulerClient()
print(
scheduler_client.create_job(
request=request
)
)
You can test using:
BILLING="..."
PROJECT="..."
LOCATION="..." # E.g. us-west1
JOB="tester"
ACCOUNT="tester"
EMAIL="${ACCOUNT}@${PROJECT}.iam.gserviceaccount.com"
# Create Project and enable Billing
gcloud projects create ${PROJECT}
gcloud beta billing projects link ${PROJECT} \
--billing-account=${BILLING}
# Enable Cloud Scheduler and Cloud Run
SERVICES=(
"batch"
"cloudscheduler"
"compute"
)
for SERVICE in ${SERVICES[@]}
do
gcloud services enable ${SERVICE}.googleapis.com \
--project=${PROJECT}
done
# Create Service Account
gcloud iam service-accounts create ${ACCOUNT} \
--project=${PROJECT}
gcloud iam service-accounts keys create ${PWD}/${ACCOUNT}.json \
--iam-account=${EMAIL} \
--project=${PROJECT}
# IAM
# https://cloud.google.com/iam/docs/understanding-roles#cloud-scheduler-roles
ROLES=(
"roles/batch.jobsEditor"
"roles/cloudscheduler.admin"
)
for ROLE in ${ROLES[@]}
do
gcloud projects add-iam-policy-binding ${PROJECT} \
--member=serviceAccount:${EMAIL} \
--role=${ROLE}
done
# ActAs
NUMBER=$(\
gcloud projects describe ${PROJECT} \
--format="value(projectNumber)")
COMPUTE_ENGINE="${NUMBER}-compute@developer.gserviceaccount.com"
gcloud iam service-accounts add-iam-policy-binding ${COMPUTE_ENGINE} \
--member=serviceAccount:${EMAIL} \
--role="roles/iam.serviceAccountUser" \
--project=${PROJECT}
Then:
python3 -m venv venv
source venv/bin/activate
# Or requirements.txt
python3 -m pip install google-cloud-batch
python3 -m pip install google-cloud-scheduler
export JOB
export LOCATION
export NUMBER
export PROJECT
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/${ACCOUNT}.json
python3 main.py