Search code examples
apache-sparkspark-ec2

Apache Spark EC2 Script launching slaves but no master


When using the Apache Spark EC2 script to launch a cluster I have found somewhat of a bug which is beginning to hit my pocket. When specifying the number of slaves: if you enter a number which is greater than or equal to your limit then the cluster is launched with your maximum number of slaves - but no master! This gives you no control over the slaves and thus cluster.

I have not found a way to launch just a master with the Apache Spark EC2 script. I have manually shut down 1 of the slaves to make space for a master; however, when then trying to re-launch the script it just says:

Searching for existing cluster my_cluster...
Found 0 master(s), 4 slaves
ERROR: There are already instances running in group my_cluster-master
or my_cluster-slaves

To overcome this, I have to log into the AWS console and terminate all the clusters and then restart. As Amazon charge per hour I am being charged for a full hours worth of my maximum number of clusters - all for nothing.

Is there a way to launch a master when slaves already exist?


Solution

  • This is happening because spark-ec2 makes 2 separate requests to EC2 to allocate instances, one for the master and one for the slaves.

    And as you might guess, it allocates the master instance after the slaves, which is causing the issue you are seeing.

    There is no way to launch a master when slaves already exist. Only the reverse is supported--launching slaves when a master already exists.

    This behavior of launching the master after the slaves sounds like a bug.

    If you'd like to report it so it gets fixed, I suggest creating an issue on the Apache JIRA for Spark under the EC2 component. I'll take a look at it.