I am using Postgres 9.2.24.
I have a table named _order
with about 100,000,000 rows. The table has a column named merged_id int8
. About 2,000,000 of the _order
rows have a merged_id
value, the rest has null.
I find two different Postgres behavior where I search _order
use the query
select * from _order where merged_id in ( 10001 ,10002 ,10003 ....., 11000);
If I create an index like this:
create index order_merged_id_index on _order(merged_id);
No matter how many ids in in clause (test from 1 to 50 to 100 to 200 to 1000) EXPLAIN
shows the search will use index_scan
.
But if I create this partial index instead:
create index order_merged_id_index on _order(merged_id) where merged_id is not null;
EXPLAIN
shows a seq_scan
for more than 100 id numbers in the WHERE
clause.
Why is this?
And is there any way to fix it?
You are running an outdated version of Postgres. Consider upgrading soon.
There are many possible reasons. I suspect a weakness in the selectivity estimation of the outdated version. I vaguely remember a maximum of 100 values for query planning involving arrays that was later improved. IN
expressions are typically transformed to = ANY (ARRAY[...]
) internally:
Either way, you might be able to fix the behavior by repeating the predicate of the partial index in your query:
SELECT * FROM _order
WHERE merged_id IN ( 10001 ,10002 ,10003 ....., 11000)
AND merged_id is not null; -- logically redundant
There may be other problems with your server configuration like cost settings or table statistics:
And don't forget to run ANALYZE
on your table at least once after creating the partial index. Or, preferably, VACUUM ANALYZE
, but that's more expensive for your big table.
However, for long lists of values, there are more efficient query variants to begin with:
SELECT o.*
FROM unnest('{10001 ,10002 ,10003 ....., 11000}'::int8[]) merged_id
JOIN _order o USING (merged_id);
See: