Search code examples
google-cloud-dataflowgoogle-cloud-scheduler

Executing Dataflow Template via Google Cloud Scheduler


I am trying to execute a custom Dataflow Template via the Google Cloud Scheduler but when I execute the Job I get an UNAUTHENTICATED exception.

How do I give Google Cloud Scheduler access to use the Dataflow google API?

Here is the URL and POST body I am using:

https://dataflow.googleapis.com/v1b3/projects/<<PROJECT>>/templates:launch?gcsPath=gs://<<GCS_BUCKET>>/template

{
"jobName": "job-name-scheduled",
"parameters": {
    "param1" : "parmval1"
"environment": {
    "tempLocation": "gs://<<BUCKET>>/temp",
    "region": "us-east1"
}

}


Solution

  • The Cloud Scheduler documentation points out that "Targeted HTTP endpoints must be publicly accessible".

    Normally, for creating that kind of Dataflow job, you would submit something like this:

    curl   -X POST  \
       'https://dataflow.googleapis.com/v1b3/projects/<project>/templates:launch?gcsPath=gs://dataflow-templates/latest/Word_Count'  \
       -H 'Authorization: Bearer '$(gcloud auth application-default print-access-token) \
       -H 'Content-Type: application/json' \
       --data '{
        "jobName": "scheduled_job",
        "parameters": {
           "inputFile" : "gs://dataflow-samples/shakespeare/kinglear.txt",
           "output": "gs://<bucket>/output/my_output"
        },
        "environment": { "zone": "us-central1-f" }
       }'
    

    But, you can't send the authorization token through Cloud Scheduler.

    For scheduling Dataflow jobs, you can see this answer instead.