Search code examples
cassandradatastax-python-driver

Cassandra Python driver doesn't page large queries


It is said in the documentation that cassandra-driver does automatic paging when queries are large enough (with default_fetch_size being 5000 rows) and will return PagedResult.

I have tested reading data from my local Cassandra which contains 9999 rows with SimpleStatement with my own fetch size, but it returned the ResultSet (9999 rows) instead of pages (instance of PagedResult). Also, I tried to change the Session.default_fetch_size but it didn't work as well.

Here's my code..

My first attempt: This is the SimpleStatement code i have made to change the fetch size.

cluster = Cluster()
session = cluster.connect(keyspace_name)

query = "SELECT * FROM user"
statement = SimpleStatement(query, fetch_size=10)
rows = list(session.execute(statement))
print(len(rows))

It prints 9999 (all rows), not 10 rows as I already set the fetch_size.


My second attempt: I tried to change the query fetch size by changing session's default fetch size Session.default_fetch_size.

cluster = Cluster()
session = cluster.connect(keyspace_name)
session.default_fetch_size = 10

query = "SELECT * FROM user"
rows = list(session.execute(query))
print(len(rows))

It also prints 9999 rows instead of 10.


My goal is not to limit the rows from my fetch query, such as SELECT * FROM user LIMIT 10. What I want is to fetch the rows page by page to avoid overload on memory.

So what actually happened?

Note: I am using Cassandra-Driver 3.25 for Python and using Python3.7

I am sorry if my additional information still doesn't make my question a good one. I never ask any questions before. So...any suggestions are welcome :)


Solution

  • Your test is invalid because your code is faulty.

    When you list(), you are in fact "materialising" all the result pages. Your code is not iterating over the rows but retrieving all of the rows.

    The driver automatically fetches the next page in the background until there are no more pages to fetch. It may not seem like it but each page only contains fetch_size rows.

    Retrieving the next page happens transparently so to you it seems like the results are not getting paged at all but that automatic behaviour from the driver is working as designed. Cheers!