cassandra apache-spark datastax datastax-enterprise

Enable Spark on Same Node As Cassandra

I am trying to test out Spark so I can summarize some data I have in Cassandra. I've been through all the DataStax tutorials and they are very vague as to how you actually enable spark. The only indication I can find is that it comes enabled automatically when you select "Analytics" node during install. However, I have an existing Cassandra node and I don't want to have to use a different machine for testing as I am just evaluating everything on my laptop.

Is it possible to just enable Spark on the same node and deal with any performance implications? If so how can I enable it so that it can be tested?

I see the folders there for Spark (although I'm not positive all the files are present) but when I check to see if it's set to Spark master, it says that no spark nodes are enabled.

dsetool sparkmaster

I am using Linux Ubuntu Mint.

I'm just looking for a quick and dirty way to get my data averaged and so forth and Spark seems like the way to go since it's a massive amount of data, but I want to avoid having to pay to host multiple machines (at least for now while testing).

Solution

Yes, Spark is also able to interact with a cluster even if it is not on all the nodes.

Package install

Edit the /etc/default/dse file, and then edit the appropriate line 
to this file, depending on the type of node you want:
...

Spark nodes:
SPARK_ENABLED=1
HADOOP_ENABLED=0
SOLR_ENABLED=0

Then restart the DSE service

http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseServ.html

Tar Install

Stop DSE on the node and the restart it using the following command

From the install directory:
...
Spark only node: $ bin/dse cassandra -k - Starts Spark trackers on a cluster of Analytics nodes.

http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseStandalone.html