Search code examples
apache-sparkcloudera-cdhspark-submit

spark2-submit throwing error with multiple packages (--packages)


I'm trying to submit following Spark2 job on CDH 5.16 cluster and it's only taking first parameter of --packages option and throwing error for second parameter

spark2-submit --packages com.databricks:spark-xml_2.11:0.4.1, com.databricks:spark-csv_2.11:1.5.0 /path/to/python-script


Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR com.databricks:spark-csv_2.11:1.5.0 with URI com.databricks. Please specify a class through --class.
        at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657)
        at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:224)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
        at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$1.<init>(SparkSubmit.scala:911)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:911)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Running this job in CDH5.16 cluster and spark installed with Spark2 CSD

Thanks in advance.


Solution

  • Don't give space between packages

    spark2-submit --packages com.databricks:spark-xml_2.11:0.4.1,com.databricks:spark-csv_2.11:1.5.0 /path/to/python-script