Search code examples
apache-sparkapache-zookeeperspark-jobserver

Spark Jobserver High Available


I have an standalone Spark cluster with few nodes. I was able to get it High Available with zookeeper. Im using Spark Jobserver spark-2.0-preview and I have configured the jobserver env1.conf file with the available spark URL's like following:

spark://<master1>:<port>,<master2>:<port>

Everything works fine, so if the master1 is down the jobserver connects to the master2.

  • But what happens if the machine where the jobserver is installed crashes?
  • Is there a way to do something like what I have done with spark? Having 2 jobserver instances on 2 separates machines and zookeeper to manage if one fails.
  • Or do I need to manage that situation by myself?

Solution

  • I would go with the third solution. I used once Spark Jobserver, not in HA but I was looking at that moment for a solution. Let me give you my opinions:

    • If Spark Jobserver is deployed on only one machine, by default it's a point of failure in case the machine crashes.
    • Spark Jobserver does not use Zookeeper for node's coordination (at least at the moment I used it), instead it uses the actor model implemented in Akka framework.
    • Best way, I think, is to handle it yourself. And here a approach might be: the simple way, is to start multiple Spark Jobserer instances, on different machines that point to the same database and a proxy in front of them. Now the problem will move the HA of the database server(probably more easy to solve)

    I suggest to check Spark Jobserver github repo, cause they discussion about this. (https://github.com/spark-jobserver/spark-jobserver/issues/42)