Search code examples
google-app-enginegoogle-cloud-datastorecursors

NDB cursors not remembering some query data?


So, I have this query:

results, cursor, more = MyModel.query(
    ancestor=mykey,
).order(-MyModel.time).fetch_page(20)

So far so good, data returned is fine etc. Now, let's fetch some more, shall we? Seems logical to do just this:

results, cursor, more = MyModel.query() \
    .order(-MyModel.time) \
    .fetch_page(20, start_cursor=Cursor(urlsafe=request.cursor))

And... weird things happen. Definetely too many results, unordered results... What's going on? So I change it to:

results, cursor, more = MyModel.query(ancestor=mykey) \
    .order(-MyModel.time) \
    .fetch_page(20, start_cursor=Cursor(urlsafe=request.cursor))

Suddenly, wat less results... let's add

.order(-MyModel.time)

And I get what I expected.

Now... Am I missing something here? Shouldn't passing cursor already take care of ordering and ancestor? There is ordering example for fetching the initial page in the documentation - https://cloud.google.com/appengine/docs/python/ndb/queries#cursors - but nowhere it is said, that subsequent pages also require ordering to be set. I would just like to know, if that is really working as intended, or it's a bug?

If it's really working as intended, is there anywhere I can read about what information exactly is stored in cursor? Would be really helpful to avoid bugs like this in future.


Solution

  • From Query Cursors (highlight from me):

    A query cursor is a small opaque data structure representing a resumption point in a query. This is useful for showing a user a page of results at a time; it's also useful for handling long jobs that might need to stop and resume. A typical way to use them is with a query's fetch_page() method. It works somewhat like fetch(), but it returns a triple (results, cursor, more). The returned more flag indicates that there are probably more results; a UI can use this, for example, to suppress a "Next Page" button or link. To request subsequent pages, pass the cursor returned by one fetch_page() call into the next.

    A cursor exists (and makes sense) only in the context of the original query from which it was produced, you can't use the cursor produced in the context of one query (the ancestor query in your case) to navigate results from another query (your non-ancestor query). I mean it might not barf (as your experiment proves) but the results are likely not what you expect :)

    Fundamentally the cursor simply represents the current position (index if you want) in the list of the query's result. Using that index in some other list might not crash, but won't make a lot of sense either (unless specifically designed to).

    Probably a good habit to use a variable to store the query for re-use instead of re-building it every time, to avoid such accidental mistakes. As illustrated in the snippets.py example on that doc:

    # Set up.
    q = Bar.query()
    q_forward = q.order(Bar.key)
    q_reverse = q.order(-Bar.key)
    
    # Fetch a page going forward.
    bars, cursor, more = q_forward.fetch_page(10)
    
    # Fetch the same page going backward.
    r_bars, r_cursor, r_more = q_reverse.fetch_page(10, start_cursor=cursor)
    

    Side note: this example actually uses the cursor from one query to navigate results in another query, but the 2 queries are designed to be "compatible".