8 node virtual-metal cluster, with 4 nodes are used for analytics. DSE version 4.8.6, Spark version 1.4.2. ... Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Getting these errors ( repeatedly ) when running dse pyspark or dse spark:
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor
I think this only happens when an interactive shell is used: a job seems to return results when submitted like this:
$ dse spark-submit ./test.py
WARN 2016-05-05 19:21:51,614 org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+---------+---------+-------+---------------+----------+
( results )
This was apparently a firewall issue.
I was pretty sure that I opened up all the ports here: https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/sec/secConfFirePort.html ...
.. at the time, it looked like communication was being attempted to a random port. That wouldn't be an issue for the server sending the request, but it would be an issue for the server receiving the request ...
So I turned the firewall off, and everything worked.