I have an standalone Spark cluster with few nodes. I was able to get it High Available with zookeeper. Im using Spark Jobserver spark-2.0-preview and I have configured the jobserver env1.conf
file with the available spark URL's like following:
spark://<master1>:<port>,<master2>:<port>
Everything works fine, so if the master1 is down the jobserver connects to the master2.
- But what happens if the machine where the jobserver is installed crashes?
- Is there a way to do something like what I have done with spark? Having 2 jobserver instances on 2 separates machines and zookeeper to manage if one fails.
- Or do I need to manage that situation by myself?
I would go with the third solution. I used once Spark Jobserver, not in HA but I was looking at that moment for a solution. Let me give you my opinions:
- If Spark Jobserver is deployed on only one machine, by default it's a point of failure in case the machine crashes.
- Spark Jobserver does not use Zookeeper for node's coordination (at least at the moment I used it), instead it uses the actor model implemented in Akka framework.
- Best way, I think, is to handle it yourself. And here a approach might be: the simple way, is to start multiple Spark Jobserer instances, on different machines that point to the same database and a proxy in front of them. Now the problem will move the HA of the database server(probably more easy to solve)
I suggest to check Spark Jobserver github repo, cause they discussion about this. (https://github.com/spark-jobserver/spark-jobserver/issues/42)