postgresql query-optimization postgresql-13

Multi column order by kills query performance even when the time range does not contain any records

I have a fairly small table of 26 million records.

CREATE TABLE t1
(
   cam         varchar(100)    NOT NULL,
   updatedat     timestamp,
   objid      varchar(40)     NOT NULL,
   image         varchar(100)    NOT NULL,
   reader      varchar(60)     NOT NULL,
   imgcap        timestamp       NOT NULL
);


ALTER TABLE t1
   ADD CONSTRAINT t1_pk
   PRIMARY KEY (reader, cam, image, objid, imgcap);

I have a simple query to iterate the records between a time range.

SELECT * FROM t1  
WHERE updatedat >= '2021-12-09 20:30:00'  and updatedat <= '2021-12-09 20:32:01'
ORDER BY reader ASC , imgcap ASC, objid ASC, cam ASC, image ASC
LIMIT 10000 
OFFSET 0;

I added an index to support the query with the comparison as the left most field and the remaining elements to support the sort.

CREATE INDEX t1_idtmp ON t1 USING btree (updatedat , reader , imgcap , objid, cam, image);

However, the query takes more than 10 seconds to get complete. It takes same time even if there are no elements in the range.

  ->  Incremental Sort  (cost=8.28..3809579.24 rows=706729 width=223) (actual time=11034.114..11065.710 rows=10000 loops=1)
        Sort Key: reader, imgcap, objid, cam, image
        Presorted Key: reader, imgcap
        Full-sort Groups: 62  Sort Method: quicksort  Average Memory: 42kB  Peak Memory: 42kB
        Pre-sorted Groups: 62  Sort Methods: top-N heapsort, quicksort  Average Memory: 58kB  Peak Memory: 58kB
        ->  Index Scan using t1_idxevtim on t1  (cost=0.56..3784154.75 rows=706729 width=223) (actual time=11033.613..11036.823 rows=10129 loops=1)
              Filter: ((updatedat >= '2021-12-09 20:30:00'::timestamp without time zone) AND (updatedat <= '2021-12-09 20:32:01'::timestamp without time zone))
              Rows Removed by Filter: 25415461
Planning Time: 0.137 ms
Execution Time: 11066.791 ms

There are couple of more indexes on table to support other use cases.

CREATE INDEX t1_idxua ON t1 USING btree (updatedat);

CREATE INDEX t1_idxevtim ON t1 USING btree (reader, imgcap);

I think, Postgresql wants to avoid an expensive sort and thinks that pre sorted key will be faster but why does Postgresql not use the t1_idtmp index as both search & sort can be satisfied with it?

Solution

why does Postgresql not use the t1_idtmp index as both search & sort can be satisfied with it?

Because the sort can't be satisfied by it. An btree index on (updatedat , reader , imgcap , objid, cam, image) can only produce data ordered by reader , imgcap , objid, cam, image for within ties of updatedat. So if your condition was for a specific value of updatedat, that would work. But since it is for a range of updatedat, that won't work as they are not all tied with each other.