Search code examples
google-cloud-dataproc

why dataproc not recognizing argument : spark.submit.deployMode=cluster?


I am submitting a spark job to dataproc this way :

gcloud dataproc jobs submit spark --cluster=$CLUSTER --region=$REGION --properties spark.jars.packages=com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.19.1, spark.submit.deployMode=cluster --class path.to.my.main.class --jars=path.to.jars -- "-p" "some_arg" "-z" "some_other_arg"

But i am getting this error :

ERROR: (gcloud.dataproc.jobs.submit.spark) unrecognized arguments: spark.submit.deployMode=cluster

Any idea why? thank you in advance for your help.

It works fine this way (without the cluster mode):

gcloud dataproc jobs submit spark --cluster=$CLUSTER --region=$REGION --properties spark.jars.packages=com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.19.1 --class path.to.my.main.class --jars=path.to.jars -- "-p" "some_arg" "-z" "some_other_arg"


Solution

  • It seems you have a space between the first property and the second. Either remove it or surround both of them with quotes.

    Another option is to replace this with

    --packages com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.19.1 --properties spark.submit.deployMode=cluster