Search code examples
sqlpostgresqlindexingsql-execution-planexplain

Postgresql: Seq Scan instead of Index Scan


I have following table:

create table if not exists inventory
(
    expired_at timestamp(0),
    -- ...
);

create index if not exists inventory_expired_at_index
    on inventory (expired_at);

However when I run following query:

EXPLAIN UPDATE "inventory" SET "status" = 'expired' WHERE "expired_at" < '2020-12-08 12:05:00';

I get next execution plan:

Update on inventory  (cost=0.00..4.09 rows=2 width=126)
  ->  Seq Scan on inventory  (cost=0.00..4.09 rows=2 width=126)
        Filter: (expired_at < '2020-12-08 12:05:00'::timestamp without time zone)

Same happens for big dataset:

EXPLAIN SELECT * FROM "inventory"  WHERE "expired_at" < '2020-12-08 12:05:00';
-[ RECORD 1 ]---------------------------------------------------------------------------
QUERY PLAN | Seq Scan on inventory  (cost=0.00..58616.63 rows=1281058 width=71)
-[ RECORD 2 ]---------------------------------------------------------------------------
QUERY PLAN |   Filter: (expired_at < '2020-12-08 12:05:00'::timestamp without time zone)

The question is: why not Index Scan but Seq Scan?


Solution

  • This is a bit long for a comment.

    The short answer is that you have two rows in the table, so it doesn't make a difference.

    The longer answer is that your are using an update, so the data rows have to be retrieved anyway. Using an index requires loading both the index and the data rows and then indirecting from the index to the data rows. It is a little more complicated. And with two rows, not worth the effort at all.

    The power of indexes is to handle large amounts of data, not small amounts of data.

    To respond to the large question: Database optimizers are not required to use an index. They use some sort of measures (often cost-based optimization) to determine whether or not an index is appropriate. In your larger example, the optimizer has determined that the index is not appropriate. This could happen if the statistics are out-of-synch with the underlying data.