Search code examples
mongodbpysparkspark-submit

Spark-submit configuration: jars,packages


Anyone can tell me how to use jars and packages .

  1. I'm working on web aplication.
  2. For Engine side spark-mongo

bin/spark-submit --properties-file config.properties --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1,com.crealytics:spark-excel_2.11:0.13.1 /home/PycharmProjects/EngineSpark.py 8dh1243sg2636hlf38m

  • I'm using above command but it's downloading each time from maven repository(jar & packages).
  • So now my concern is if i'm offline it gives me error
  • its good if their any way to download it only once so no need to download each time
  • any suggestion how to deal with it.

Solution

  • Get all the jar files required then pass them as a parameter to the spark-submit.

    This way you need not to download files everytime you submit the spark job.

    You have to use --jars instead of --packages

    bin/spark-submit --properties-file config.properties --jars /home/PycharmProjects/spark-excel_2.11-0.11.1.jar,/home/PycharmProjects/mongo-spark-connector_2.11-2.4.1.jar /home/PycharmProjects/EngineSpark.py 8dh1243sg2636hlf38m