Search code examples
apache-sparkcassandraspark-cassandra-connector

How do we operate with multiple Cassandra setups in Apache Spark?


I have two different Cassandra setups on two different machines. I am trying to read data from one machine, process it using Spark and then write the result into the second setup. I am using spark-cassandra-connector-java_2.10. When I try to use javaFunctions.writeBuilder, it allows me to specify keyspace and table name, but the Cassandra host is fetched from the Spark Context. Is there a way to write data into a Cassandra setup other than one mentioned in Spark Context? How do we override this default setting?


Solution

  • Use the following code :

    SparkConf confForCassandra = new SparkConf().setAppName("ConnectToCassandra")
                    .setMaster("local[*]")
                    .set("spark.cassandra.connection.host", "<cassandraHost>");
    
    CassandraConnector connector = CassandraConnector.apply(confForCassandra);
    
    javaFunctions(rdd).writerBuilder("keyspace", "table", mapToRow(Table.class)).withConnector(connector).saveToCassandra();