Search code examples
sqljsonpostgresqlpaginationaggregate-functions

Determine a target record's offset in a Postgres query


Problem: I'm building a RESTful API on top of Postgres that includes the following parameters:

  • An identifier <id> for a particular record in a table
  • Filter and sort parameters (optional)
  • count or page size (optional)

For example: /presidents/16?filter=!isAlive&sort=lastName,givenNames&count=5

The API returns count (or possibly fewer) records that include the record specified by <id> and the offset and count of records returned.

In the example above, the result might look like:

{
  "count": 5,
  "offset": 20,
  "records": [
    { "id": 17, "givenNames": "Andrew",    "lastName": "Johnson", "isAlive": false },
    { "id": 36, "givenNames": "Lyndon B.", "lastName": "Johnson", "isAlive": false },
    { "id": 35, "givenNames": "John F.",   "lastName": "Kennedy", "isAlive": false },
    { "id": 16, "givenNames": "Abraham",   "lastName": "Lincoln", "isAlive": false },
    { "id": 4,  "givenNames": "James",     "lastName": "Madison", "isAlive": false }
  ]
}

Current Solution: My current approach is to make three calls in serial (composing them into a single, nested query, but efficiency question remains) and I'm looking for a better approach.

  1. Get the record referred to by target <id>, used in query #2.

    • e.g. select * from presidents where id = 16
  2. Get the offset of the target <id>

    • e.g. select count(*) from presidents where lastName < 'Lincoln' and givenNames < 'Abraham' and not isAlive order by lastName, givenNames, id
  3. After computing the appropriate offset and limit, using the passed (or default) count and the count(*) from #2, retrieve the page of records.

    • e.g. select * from presidents where not isAlive order by lastName, givenNames, id offset 20 limit 5

Updated SQL

I took what @ErwinBrandstetter provided in his answer below, which was close to what I was looking for in way of a monster statement, and changed it into this:

WITH prez AS (SELECT lastName, givenNames, id, isAlive FROM presidents WHERE not isAlive),
cte AS (SELECT * FROM prez WHERE id = 16),
start AS (SELECT (COUNT(*)/5)*5 as "offset" FROM prez WHERE (lastName, givenNames, id, isAlive) < (TABLE cte))
SELECT row_to_json(sub2) AS js
FROM  (
 SELECT (SELECT * FROM start) AS "offset"
       , count(*) AS "count"
       , array_agg(sub1) AS records
 FROM  (
    SELECT * from prez
    ORDER  BY lastName, givenNames, id
    OFFSET (SELECT * from start) LIMIT  5
    ) sub1
) sub2;

SQL Fiddle

Is there an easier way in Postgres to determine the the offset of a given record for a given query?

Related questions:


Solution

  • The one-stop shop you are looking for:

    WITH cte AS (SELECT lastName, givenNames, id AS x FROM presidents WHERE id = 16)
    SELECT row_to_json(sub2) AS js
    FROM  (
       SELECT (SELECT count(*) FROM presidents
               WHERE (lastName, givenNames, id) < (TABLE cte)) AS "offset"
             , count(*) AS "count"
             , array_agg(sub1) AS records
       FROM  (
          SELECT id, givenNames, lastName, isAlive
          FROM   presidents
          WHERE (lastName, givenNames, id) >= (TABLE cte)
          ORDER  BY lastName, givenNames, id
          LIMIT  5
          ) sub1
      ) sub2;
    

    SQL Fiddle.

    Your original query #2 was incorrect:

    SELECT * FROM presidents
    WHERE lastName < 'Lincoln'
    AND   givenNames < 'Abraham'
    AND   id < 16 ...
    

    To preserve your sort order it must be:

    SELECT * FROM presidents
    WHERE (lastName, givenNames, id) < ('Lincoln', 'Abraham', 16) ...
    

    Comparing row types, not AND-ing expressions on individual columns, which would yield a completely different result. That's how ORDER BY effectively operates.

    You should have a multicolumn index on (lastName, givenNames, id) in addition to the PRIMARY KEY on id to make this fast.

    Variant with row_number() in a CTE

    Based on your updated requirements.

    WITH prez AS (
       SELECT lastName, givenNames, id, isAlive
            , row_number() OVER (ORDER BY lastName, givenNames, id) AS rn
       FROM   presidents
       WHERE  NOT isAlive
       )
    , x AS (SELECT ((rn-1)/5)*5 AS "offset" FROM prez WHERE id = 16)
    SELECT row_to_json(sub2) AS js
    FROM  (
       SELECT (SELECT "offset" FROM x)
            , count(*) AS "count"
            , array_agg(sub1) AS records
       FROM  (
          SELECT lastName, givenNames, id, isAlive
          FROM   prez
          WHERE  rn > (SELECT "offset" FROM x)
          ORDER  BY rn
          LIMIT  5
          ) sub1
       ) sub2;
    

    SQL Fiddle.

    Test performance with EXPLAIN ANALYZE.