I have a question regarding the heap size to have while executing a select query with billions rows.
I use jdbc with prepared statement
and a fetch size of 1000 rows.
The code below illustrate my question :
ResultSet rs = ...
for (Row r : rs) {
// If the result is not fully fetched
if (rs.getAvailableWithoutFetching() == FETCH_SIZE && !rs.isFullyFetched()) {
LOGGER.info("Load " + FETCH_SIZE + " more rows");
rs.fetchMoreResults();
}
...
}
Does the java load the billions rows or FETCH_SIZE
rows by FETCH_SIZE
rows ?
Assuming you are using Datastax driver, from documentation for setFetchSize
:
The fetch size controls how much resulting rows will be retrieved simultaneously (the goal being to avoid loading too much results in memory for queries yielding large results). Please note that while value as low as 1 can be used, it is highly discouraged to use such a low value in practice as it will yield very poor performance. If in doubt, leaving the default is probably a good idea.
Only SELECT queries only ever make use of that setting.
Note: Paging is not supported with the native protocol version 1. If you call this method with fetchSize > 0 and fetchSize != Integer.MAX_VALUE and the protocol version is in use (i.e. if you've force version 1 through Cluster.Builder.withProtocolVersion(int) or you use Cassandra 1.2), you will get UnsupportedProtocolVersionException when submitting this statement for execution
So it does not keep previously fetched results in the memory but you have to be careful not to have any references in your code to those results in order for it to be garbage collected. Also read the documentation for fetchMoreResults - it might not work the way you are expecting.