Search code examples
postgresqlpostgresql-11

Returning value of id column of every n-th row in postgresql


I need to process the id column (primary key, integer) of a table with millions of records (about 25M). However, i am only interested in every n-th id.

Currently a simple approach is used:

select id from big order by id;

and then every n-th id is processed by the client software (cursor based).

I was wondering if this could not be much more efficient, if the selection of every n-th id is delegated to postgresql. Tried this one:

select id from 
    (select id, row_number() over (order by id) from big) _ 
    where row_number % 10000 = 0;`

However, this approach is much slower:

                                                                       QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on _  (cost=0.44..1291886.20 rows=115413 width=4) (actual time=9.385..10731.994 rows=2308 loops=1)
   Filter: ((_.row_number % '10000'::bigint) = 0)
   Rows Removed by Filter: 23080220
   ->  WindowAgg  (cost=0.44..945648.28 rows=23082528 width=12) (actual time=0.107..9450.396 rows=23082528 loops=1)
         ->  Index Only Scan using big_pkey on big  (cost=0.44..599410.36 rows=23082528 width=4) (actual time=0.093..2403.921 rows=23082528 loops=1)
               Heap Fetches: 0
 Planning Time: 0.172 ms
 Execution Time: 10732.229 ms
(8 rows)

The simple query has an execution time of 2721.101 ms (so almost 4 times faster).

Question: is there a better way to do this? (using PostgreSQL 11)


Solution

  • Create a cursor and fetch only every tenth row (also works for different intervals):

    BEGIN; -- must be in a transaction
    
    DECLARE cc CURSOR FOR
       SELECT id FROM big ORDER BY id;
    
    /* skip 9 rows */
    MOVE 9 IN cc;
    
    FETCH NEXT FROM cc;
    

    Continue performing the MOVE and FETCH in a loop until you run out of rows.