Search code examples
apache-sparkmesosmesosphere

Spark framework on Mesos


I have few questions around Mesos-spark:

  1. When I submit spark job with different spark context on Mesos, does it invoke different mesos-spark framework instance or use the same.
  2. How can I ensure that each time different spark framework is created.
  3. Can I specify constraints to reserve/pre-allocate the mesos-slave for a specific spark context or framework instance. I understand that it defeats the purpose of Mesos a bit and Mesos can guarantee the memory and CPU in coarse grained mode. But for some reason, I don't want to share the physical machines that run the tasks(slaves) across different spark jobs (meant for different users)

Solution

  • 1/2, Each Spark Context will launch a separate Mesos framework, and you should be able to confirm this by navigating to the Mesos UI and see all the frameworks created, which should correspond 1 to 1 to the Spark contexts you've created.

    1. You can reserve resources for a set of Spark tasks by reserving a piece of your resources on each slave and tag them with a certain role.

    For example if you want to reserve 4gb memory and 2 cores on a slave to run a certain set of tasks, you can specifiy --resources="cpus(spark1):2,mem(spark1):4098,cpus():2,mem():4098" when you launch the slave. This gives spark1 role 4gb of ram and 2 cpus, while all other frameworks 2 cpu and 4gb ram. Frameworks that register itself with spark1 role will then receive the resources reserved plus any available wildcard (*) resources for it to use.

    On the Spark framework side, I have a open PR that supports registering a specific role (https://github.com/apache/spark/pull/4960), that hopefully should be merged soon. So if you like to use this feature you need to apply this pull request and build Spark yourself for now.