Search code examples
google-cloud-platformgoogle-cloud-dataprocgoogle-cloud-dataproc-serverless

Serverless Dataproc Error- Batch ID is required


While trying to submit a spark job using Serverless Dataproc using rest API https://cloud.google.com/dataproc-serverless/docs/quickstarts/spark-batch#dataproc_serverless_create_batch_workload-drest

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://dataproc.googleapis.com/v1/projects/project-id/locations/region/batches"

I got this error response

{
  "error": {
    "code": 400,
    "message": "Batch ID is required",
    "status": "INVALID_ARGUMENT"
  }
}

What Am I missing here?


Solution

  • I tested with gcloud --log-http:

    $ gcloud dataproc batches submit spark --log-http \
        --jars=file:///usr/lib/spark/examples/jars/spark-examples.jar \
        --class=org.apache.spark.examples.SparkPi \
        -- 1000
    ...
    ==== request start ====
    uri: https://dataproc.googleapis.com/v1/projects/my-project/locations/us-west1/batches?alt=json&batchId=21dd24ca279a4603926d4e59d65bfaf9&requestId=21dd24ca279a4603926d4e59d65bfaf9
    method: POST
    ...
    

    Note the batchId=21dd24ca279a4603926d4e59d65bfaf9 in the URL.

    I also tested manually set the ID with --batch:

    $ gcloud dataproc batches submit spark --log-http  \
      --batch=foobar \
      --jars=file:///usr/lib/spark/examples/jars/spark-examples.jar \
      --class=org.apache.spark.examples.SparkPi \
      -- 1000
    
    ...
    ==== request start ====
    uri: https://dataproc.googleapis.com/v1/projects/my-project/locations/us-west1/batches?alt=json&batchId=foobar&requestId=c7b5a753cac4483da21b1ba1c6c2a2d1
    method: POST
    ...
    

    Seems like the REST API requires a batchId parameter in the URL, but when using gcloud it automatically generates one.