Search code examples
javalistcassandraresultsetdatastax

Using a Datastax Cassandra ResultSet with Java 8 Parallel Streams - Quickly


I am fetching a lot of rows from Cassandra using the Datastax Driver and I need to process them as quickly as possible.

I have looked into using List::parallelStream().forEach() which seems great at first since ResultSet acts a lot like a List, but sadly I am unable to use parallelStream() directly on ResultSet. To get this to work I first have to use ResultSet::all() which really is slow - I assume it iterates over each element.

ResultSet rs = this.getResultSet(); // Takes <1 second

// Convert the ResultSet to a list so as I can use parallelStream().
List<Row> rsList = rs.all(); // Takes 21 seconds

rsList.parallelStream().forEach(this::processRow); // Takes 3 seconds

Is there any faster way I can process each row of the result set?


Solution

  • To get this to work I first have to use ResultSet::all() which really is slow

    ResultSet.all() will fetch all rows using server-side paging. You can control the page size with statement.setFetchSize()

    Is there any faster way I can process each row of the result set?

    It depends on your query, what is it ? If you're doing a Full partition scan, there is only a couple of machines doing the job but if you're fetching data from multiple partitions, you can try to parallelize them with multiple queries, one for each partition