I am fetching a lot of rows from Cassandra using the Datastax Driver and I need to process them as quickly as possible.
I have looked into using List::parallelStream().forEach()
which seems great at first since ResultSet
acts a lot like a List
, but sadly I am unable to use parallelStream()
directly on ResultSet
. To get this to work I first have to use ResultSet::all()
which really is slow - I assume it iterates over each element.
ResultSet rs = this.getResultSet(); // Takes <1 second
// Convert the ResultSet to a list so as I can use parallelStream().
List<Row> rsList = rs.all(); // Takes 21 seconds
rsList.parallelStream().forEach(this::processRow); // Takes 3 seconds
Is there any faster way I can process each row of the result set?
To get this to work I first have to use ResultSet::all() which really is slow
ResultSet.all()
will fetch all rows using server-side paging. You can control the page size with statement.setFetchSize()
Is there any faster way I can process each row of the result set?
It depends on your query, what is it ? If you're doing a Full partition scan, there is only a couple of machines doing the job but if you're fetching data from multiple partitions, you can try to parallelize them with multiple queries, one for each partition