Search code examples
sqlpostgresqlindexingsql-in

Postgres IN clause with many values not using partial index


I am using Postgres 9.2.24.

I have a table named _order with about 100,000,000 rows. The table has a column named merged_id int8. About 2,000,000 of the _order rows have a merged_id value, the rest has null.

I find two different Postgres behavior where I search _order use the query

select * from _order where merged_id in ( 10001 ,10002 ,10003 ....., 11000);

If I create an index like this:

create index order_merged_id_index on _order(merged_id);

No matter how many ids in in clause (test from 1 to 50 to 100 to 200 to 1000) EXPLAIN shows the search will use index_scan.

But if I create this partial index instead:

create index order_merged_id_index on _order(merged_id) where merged_id is not null;

EXPLAIN shows a seq_scan for more than 100 id numbers in the WHERE clause.

Why is this?
And is there any way to fix it?


Solution

  • You are running an outdated version of Postgres. Consider upgrading soon.

    There are many possible reasons. I suspect a weakness in the selectivity estimation of the outdated version. I vaguely remember a maximum of 100 values for query planning involving arrays that was later improved. IN expressions are typically transformed to = ANY (ARRAY[...]) internally:

    Either way, you might be able to fix the behavior by repeating the predicate of the partial index in your query:

    SELECT * FROM _order 
    WHERE merged_id IN ( 10001 ,10002 ,10003 ....., 11000)
    AND   merged_id is not null;  -- logically redundant
    

    There may be other problems with your server configuration like cost settings or table statistics:

    And don't forget to run ANALYZE on your table at least once after creating the partial index. Or, preferably, VACUUM ANALYZE, but that's more expensive for your big table.

    However, for long lists of values, there are more efficient query variants to begin with:

    SELECT o.*
    FROM   unnest('{10001 ,10002 ,10003 ....., 11000}'::int8[]) merged_id
    JOIN   _order o USING (merged_id);
    

    See: