apache-spark pyspark datastax-enterprise

Errors with Spark 1.4.2

8 node virtual-metal cluster, with 4 nodes are used for analytics. DSE version 4.8.6, Spark version 1.4.2. ... Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)

Getting these errors ( repeatedly ) when running dse pyspark or dse spark:

org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor

I think this only happens when an interactive shell is used: a job seems to return results when submitted like this:

$ dse spark-submit ./test.py
WARN  2016-05-05 19:21:51,614 org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+---------+---------+-------+---------------+----------+
( results )

Solution

This was apparently a firewall issue.

I was pretty sure that I opened up all the ports here: https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/sec/secConfFirePort.html ...

.. at the time, it looked like communication was being attempted to a random port. That wouldn't be an issue for the server sending the request, but it would be an issue for the server receiving the request ...

So I turned the firewall off, and everything worked.