Search code examples
scalaapache-sparkcassandradatastax

spark Cassandra tuning


How to set following Cassandra write parameters in spark scala code for version - DataStax Spark Cassandra Connector 1.6.3.

Spark version - 1.6.2

spark.cassandra.output.batch.size.rows

spark.cassandra.output.concurrent.writes

spark.cassandra.output.batch.size.bytes

spark.cassandra.output.batch.grouping.key

Thanks, Chandra


Solution

  • In DataStax Spark Cassandra Connector 1.6.X, you can pass these parameters as part of your SparkConf.

    val conf = new SparkConf(true)
        .set("spark.cassandra.connection.host", "192.168.123.10")
        .set("spark.cassandra.auth.username", "cassandra")            
        .set("spark.cassandra.auth.password", "cassandra")
        .set("spark.cassandra.output.batch.size.rows", "100")            
        .set("spark.cassandra.output.concurrent.writes", "100")
        .set("spark.cassandra.output.batch.size.bytes", "100")            
        .set("spark.cassandra.output.batch.grouping.key", "partition")
    
    val sc = new SparkContext("spark://192.168.123.10:7077", "test", conf)
    

    You can refer to this readme for more information.