I have a DCOS cluster with 3 agent nodes; I have few services like spark running on DCOS.
- If i scale my DCOS cluster, do i need to scale my spark as well (because if i add a 4th node to DCOS cluster and when i run a spark job, master may allocate resources for the spark job to be run on the 4th node where spark is not installed and hence it will fail)?
In my observation, i found that the jobs are being submitted to any node that Mesos master sees.
- Is there a way where i can specify Spark job not run on certain nodes?
Dynamic allocation may help, but I've not used it:
http://spark.apache.org/docs/latest/running-on-mesos.html#dynamic-resource-allocation-with-mesos
http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
To install mutiple instances of the DC/OS Spark package, set each service.name to a unique name (e.g.: “spark-dev”) in your JSON configuration file during installation:
{
"service": {
"name": "spark-dev"
}
}
To use a specific Spark instance from the DC/OS Spark CLI:
$ dcos config set spark.app_id <service.name>
https://docs.mesosphere.com/1.8/usage/service-guides/spark/install/