Search code examples
postgresqlindexingdatabase-performancesqlperformance

Why PostgreSQL not using index properly?


Schema:

create table records(
  id         varchar,
  updated_at bigint
);
create index index1 on records (updated_at, id);

Query. It iterates over recently updated records. Fetches 10 records, remembers the last one and then fetches next 10 and so on.

select * from objects
where updated_at > '1' or (updated_at = '1' and id > 'some-id')
order by updated_at, id
limit 10;

It uses index, but it doesn't uses it wisely and also applies filter and processes tons of records, see Rows Removed by Filter: 31575 in query explanation below.

The strange thing is that if you remove or and leave either left or right condition - it works well for both. But it seems like if can't figure out how to apply index correctly if both conditions are used simultaneously with or.

Limit  (cost=0.42..19.03 rows=20 width=1336) (actual time=542.475..542.501 rows=20 loops=1)
   ->  Index Scan using index1 on records  (cost=0.42..426791.29 rows=458760 width=1336) (actual time=542.473..542.494 rows=20 loops=1)
         Filter: ((updated_at > '1'::bigint) OR ((updated_at = '1'::bigint) AND ((id)::text > 'some-id'::text)))
         Rows Removed by Filter: 31575
 Planning time: 0.180 ms
 Execution time: 542.532 ms
(6 rows)

Postgres version is 9.6


Solution

  • I would try this as two separate queries, combining their results like this:

    select *
    from
      (
        select   *
        from     objects
        where    updated_at > 1
        order by updated_at, id
        limit    10
        union all
        select   *
        from     objects
        where    updated_at = 1
          and    id > 'some-id'
        order by updated_at, id
        limit    10
      ) t
    order by updated_at, id
    limit    10
    

    My guess is that the two queries would each optimise pretty well and running both would be more efficient than the current one.

    I would also make those columns NOT NULL if possible.