Search code examples
javaapache-sparkcassandradatastax

Why don't we have a ReaderBuilder defined in Spark-Cassandra-Connector for reading data from cassandra DB


I saw the github repo of spark-cassandra-connector and i did not found a ReaderBuilder implemented their but a WriterBuilder was implemented and can anyone help me with that as i want to read data from cassandra DB using a CassandraConnector reference.

I wanted to connect two cassandra clusters in the same SparkContext and i want to read data from both of them and so i needed a ReaderBuilder for reading data from my second cassandra cluster also I am working with java language here.

Github repo Link: https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/java/com/datastax/spark/connector/japi/RDDAndDStreamCommonJavaFunctions.java

CassandraConnector eventsConnector = CassandraConnector.apply(sc.getConf().set("spark.cassandra.connection.host", "192.168.36.234"));

Solution

  • My first suggestion would be to not use RDDs in Java. RDD's in Java is much more difficult than in Scala and it's also the old api. I would suggest using DataFrames instead. These provide a much cleaner interface between different datasources as well as automatic optimizations and other benefits.

    Now if you cannot use DataFrames, you would instead just make the CassandraJavaRDD and then use "withConnector" or "withReadConf" to change the read configuration.

    https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/java/com/datastax/spark/connector/japi/rdd/CassandraJavaRDD.java#L123-L129

    Something like

    val cluster2 = CassandraConnector eventsConnector = 
      CassandraConnector.apply(
        sc.getConf()
          .set("spark.cassandra.connection.host", "192.168.36.234"));
    
      javaFunctions(sc).cassandraTable(ks, "test_table").withConnector(cluster2).collect()
    }
    

    There is no need for a builder because the RDD itself has a fluent API. Since writing happens immediately on the conclusion of the call it needed a builder.