Search code examples
google-cloud-platformgoogle-cloud-spanner

How do I implement pagination?


I have a People table (Id, first_name, last_name), where the primary key is id. I want to be able to look up the first N people in the table ordered by (last_name, first_name, Id). In some cases, I need to lookup the next N people, and so on. I want to do this efficiently. What is the best way to do this?


Solution

  • There are two primary ways:

    • Use LIMIT and OFFSET
    • Use LIMIT and key-of-previous-page

    The OFFSET strategy lets you read an arbitrary page, but is not efficient since each time the query runs it must read the rows from all previous pages. It is the easiest to implement and can be an acceptable strategy (particularly if you only want the first few pages), but in general it is not recommended. The key-of-previous-page strategy does require pages are read in order, but is more efficient because each page reads only the rows it needs.

    So let's start with the original query to fetch results from your table ordered by (LastName, FirstName, Id):

    SELECT
      t.id, 
      t.first_name, 
      t.last_name
    FROM
      People as t
    ORDER BY
      t.last_name,
      t.first_name,
      t.id
    LIMIT
      @limit_rows
    

    You will probably want to ensure that your queries all view a consistent snapshot of your database's data, so you'll want to make sure that your sequence of queries always reads from the same timestamp. The easiest way to accomplish this is for your first query to be a ReadOnly transaction with returnReadTimestamp set to true. Then, your subsequent queries can also be ReadOnly transactions, and they should use the same timestamp returned by the original query as their readTimestamp. Note that -- whatever approach you choose -- the ORDER BY clause is vital to assure consistent results across your sequence of queries. Let's say the last row returned is (1709, "John", "Smith"). Then your first attempt at a query to get the next page of results might look like this:

    SELECT
      t.id, 
      t.first_name, 
      t.last_name
    FROM
      People as t
    WHERE
      t.last_name > "Smith"
      OR
      (t.last_name = "Smith" and t.first_name > "John")
      OR
      (t.last_name = "Smith" and t.first_name = "John" AND t.id > 1709)
    ORDER BY
      t.last_name,
      t.first_name,
      t.id
    LIMIT
      @limit_rows
    

    The middle WHERE clause is new. But writing this predicate is trickier than you might think. You may have to handle NULL values. You have to handle the case where there are multiple people named John Smith with different id values. And you would need to be really careful with floating point numbers and NaN values. Cloud Spanner's Read API can also be useful in cases like this, as it makes it easier to paginate a range scan on a table.