Search code examples
cassandraapache-sparkdatastax-enterprise

How to load Spark Cassandra Connector in the shell?


I am trying to use Spark Cassandra Connector in Spark 1.1.0.

I have successfully built the jar file from the master branch on GitHub and have gotten the included demos to work. However, when I try to load the jar files into the spark-shell I can't import any of the classes from the com.datastax.spark.connector package.

I have tried using the --jars option on spark-shell and adding the directory with the jar file to Java's CLASSPATH. Neither of these options work. In fact, when I use the --jars option, the logging output shows that the Datastax jar is getting loaded, but I still cannot import anything from com.datastax.

I have been able to load the Tuplejump Calliope Cassandra connector into the spark-shell using --jars, so I know that's working. It's just the Datastax connector which is failing for me.


Solution

  • I got it. Below is what I did:

    $ git clone https://github.com/datastax/spark-cassandra-connector.git
    $ cd spark-cassandra-connector
    $ sbt/sbt assembly
    $ $SPARK_HOME/bin/spark-shell --jars ~/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/connector-assembly-1.2.0-SNAPSHOT.jar 
    

    In scala prompt,

    scala> sc.stop
    scala> import com.datastax.spark.connector._
    scala> import org.apache.spark.SparkContext
    scala> import org.apache.spark.SparkContext._
    scala> import org.apache.spark.SparkConf
    scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host", "my cassandra host")
    scala> val sc = new SparkContext("spark://spark host:7077", "test", conf)