Search code examples
apache-sparkcassandracassandra-3.0spark-cassandra-connector

spark-cassandra-connector configuration: concurrent.reads vs input.reads_per_sec


feeling confused when reading https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#read-tuning-parameters

concurrent.reads: Sets read parallelism for joinWithCassandra tables.

input.reads_per_sec: Sets max requests per core per second for joinWithCassandraTable

decription for concurrent.reads from a SDE in Datastax: https://groups.google.com/a/lists.datastax.com/d/msg/spark-connector-user/PaQm1LT7Qlk/h41WLnHfBAAJ

Concurrent reads set to 4 means in a 4 core spark executor means, 16 requests will run MAX at the same time.

looks like concurrent.reads does the same thing as input.reads_per_sec.

what is the true difference between them?


Solution

  • They are not the same, but could be treated as related...

    • concurrent.reads defines how many simultaneous requests per core could be sent simultaneously (so-called in-flight requests). In some cases you can lower it from default to avoid overload of Cassandra nodes from handling too many requests in parallel;
    • input.reads_per_sec defines how many requests per core per second could be executed.