Search code examples
javacassandradatastaxcqlnosql

PagingState for Statement in CQL


I was trying to understand how PagingState works with Statement in Cassandra. I tried with a sample that inserts few 1000s of records into database and tried reading the same from DB with fetch size set to 10 and using paging state. This is working perfectly fine. Here is my sample junit code:

@Before
public void setup() {
    cassandraTemplate.executeQuery("create table if not exists pagesample(a int, b int, c int, primary key(a,b))");
    String insertQuery = "insert into pagesample(a,b,c) values(?,?,?)";
    PreparedStatement insertStmt = cassandraTemplate.getConnection().prepareStatement(insertQuery);
    for(int i=0; i < 5; i++){
        for(int j=100; j<1000; j++){
            cassandraTemplate.executeQuery(insertStmt, new Object[]{i, j, RandomUtils.nextInt()});
        }
    }
}

@Test
public void testPagination() {
    String selectQuery = "select * from pagesample where a=?";
    String pagingStateStr = null;
    for(int run=0; run<90; run++){
        ResultSet resultSet = selectRows(selectQuery, 10, pagingStateStr, 1);
        int fetchedCount = resultSet.getAvailableWithoutFetching();
        System.out.println(run+". Fetched size: "+fetchedCount);
        for(Row row : resultSet){
            System.out.print(row.getInt("b")+", ");
            if(--fetchedCount == 0){
                break;
            }
        }
        System.out.println();

        PagingState pagingState = resultSet.getExecutionInfo().getPagingState();
        pagingStateStr =  pagingState.toString();
    }
}

public ResultSet selectRows(String cql, int fetchSize, String pagingState, Object... bindings){
    SimpleStatement simpleStatement = new SimpleStatement(cql, bindings);
    statement.setFetchSize(fetchSize);
    if(StringUtils.isNotEmpty(pagingState)){
        statement.setPagingState(PagingState.fromString(pagingState));
    }
    return getSession().execute(simpleStatement);
}

When I execute this program, I see that every iteration in testPagination is exactly printing 10 records. But here is what the documentation says:

  • Note that setting a fetch size doesn’t mean that Cassandra will always return the exact number of rows, it is possible that it returns slightly more or less results.

I am not really able to understand why Cassandra will return not exactly the same number of rows as specified in fetch size. Is this the case when there is no where clause provided in the query? Will it return exact number of records when a query is constrained on a partition key? Please clarify.


Solution

  • From the CQL protocol specification:

    Clients should also not assert that no result will have more than result_page_size results. While the current implementation always respect the exact value of result_page_size, we reserve ourselves the right to return slightly smaller or bigger pages in the future for performance reasons

    So it's good practice to always rely on getAvailableWithoutFetching instead of the page size, in case Cassandra changes its implementation in the future.