Search code examples
apache-sparkcassandraspark-cassandra-connector

How to implement rdd.bulkSaveToCassandra in datastax


  • I am using datastax cluster with 5.0.5.
[cqlsh 5.0.1 | Cassandra 3.0.11.1485 | DSE 5.0.5 | CQL spec 3.4.0 | Native proto

using spark-cassandra-connector 1.6.8

I tried to implement below code.. import is not working.

val rdd: RDD[SomeType] = ... // create some RDD to save import
com.datastax.bdp.spark.writer.BulkTableWriter._

rdd.bulkSaveToCassandra(keyspace, table)

Can someone suggest me how to implement this code. Are they any dependenceis required for this.


Solution

  • Cassandra Spark Connector has saveToCassandra method that could be used like this (taken from documentation):

    val collection = sc.parallelize(Seq(("cat", 30), ("fox", 40)))
    collection.saveToCassandra("test", "words", SomeColumns("word", "count"))
    

    There is also saveAsCassandraTableEx that allows you to control schema creation, and other things - it's also described in documentation referenced above.

    To use them you need to import com.datastax.spark.connector._ described in "Connecting to Cassandra" document.

    And you need to add corresponding dependency - but this depends on what build system do you use.

    The bulkSaveToCassandra method is available only when you're using DSE's connector. You need to add corresponding dependencies - see documentation for more details. But even primary developer of Spark connector says that it's better use saveToCassandra instead of it.