Search code examples
apache-sparkcassandraspark-cassandra-connector

How to specify multiple Spark Standalone masters (for spark.master property)?


I have 1 master with 3 worker nodes communicating to master.

As a disaster recovery we have created 2 Masters and let zookeeper elect the master. I am using datastax's spark Cassandra connector. Is there a way to pass multiple Spark Master URLs to try serially which ever succeeds.

new SparkConf(true)
        .set("spark.cassandra.connection.host", "10.3.2.1")  
        .set("spark.cassandra.auth.username","cassandra")
        .set("spark.cassandra.auth.password",cassandra"))
        .set("spark.master", "spark://1.1.2.2:7077") // Can I give multiple Urls here?
        .set("spark.app.name","Sample App");

Solution

  • tl;dr Use comma to separate host:port entries, e.g. spark://localhost:7077,localhost:17077

    Please note that you should avoid hardcoding connection details as they are part of operations and should really be defined using spark-submit's --master command-line option:

    $ ./bin/spark-submit --help
    
    Options:
      --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
    

    See the relevant Spark code where the parsing happens:

    val masterUrls = sparkUrl.split(",").map("spark://" + _)
    

    while sparkUrl is matched using """spark://(.*)""".r regex.