Search code examples
google-cloud-dataproc

Give custom job_id to Google Dataproc cluster for running pig/hive/spark jobs


Is there any flag available to give custom job_id to dataproc jobs. I am using this command to run pig jobs.

gcloud dataproc jobs submit pig --cluster my_cluster --file my_queries.pig

I use similar commands to submit pyspark/hive jobs.

This command creates a job_id on its own and tracking them later on is difficult.


Solution

  • Reading the gcloud code you can see that the args called id is used as job name

    https://github.com/google-cloud-sdk/google-cloud-sdk/blob/master/lib/googlecloudsdk/command_lib/dataproc/jobs/submitter.py#L56

    therefore you only need to add the --id to you gcloud command

    gcloud dataproc jobs submit spark --id this-is-my-job-name --cluster my-cluster --class com.myClass.Main --jars gs://my.jar