According to the Spark on Mesos docs one needs to set the spark.executor.uri
pointing to a Spark distribution:
val conf = new SparkConf()
.setMaster("mesos://HOST:5050")
.setAppName("My app")
.set("spark.executor.uri", "<path to spark-1.4.1.tar.gz uploaded above>")
The docs also note that one can build a custom version of the Spark distribution.
My question now is whether it is possible/desirable to pre-package external libraries such as
which will be used in mostly all of the job-jars I'll submit via spark-submit
to
sbt assembly
need to package the fat jarsIf so, how can this be achieved? Generally speaking, are there some hints on how the fat jar generation on job submitting process can be speed up?
Background is that I want to run some code-generation for Spark jobs, and submit these right away and show the results in a browser frontend asynchronously. The frontend part shouldn't be too complicated, but I wonder how the backend part can be achieved.
After I discovered the Spark JobServer project, I decided that this is the most suitable one for my use case.
It supports dynamic context creation via a REST API, as well as adding JARs to the newly created context manually/programmatically. It also is capable of runnign low-latency synchronous jobs, which is exactly what I need.
I created a Dockerfile so you can try it out with the most recent (supported) versions of Spark (1.4.1), Spark JobServer (0.6.0) and buit-in Mesos support (0.24.1):
References: