I have table which has a few billion rows. There are two columns in the table:
match_id uuid,
group_id integer
And there are indexes created on both of the above columns:
create index if not exists my_tbl_match_id_idx on my_tbl (match_id);
create index if not exists my_tbl_group_id_idx on my_tbl (group_id);
When I perform following query, both indexes are scanned in parallel.
match_id index look up is much faster and returns result back in 13 milliseconds while group_id
index look up is slower and takes 1.3 seconds. Because both indexes are looked up in parallel, overall query time = 1.3 seconds i.e. look up time of group_id index.
Is there any way Postgres can just use group_id
index after the match_id
result set is computed?
SELECT *
FROM my_table
WHERE match_id = 'e089e0af-543b-45d5-abbf-22c6c5ed9a01'
AND (group_id IN (167,1704,1864,2065,2145,3812,3814,3855,7462,11393,11394,11396))
Or do I need to create a composite index for match_id
and group_id
?
There are many factors with bearing on the best course of action, most of which are not clear from the question. Consider instructions for postgresql-performance questions here.
Generally speaking, a multicolumn index on (match_id, group_id)
(with leading match_id
!) performs substantially better for your query than combining two separate indexes in bitmap index scans.
In effect, this index implements your requirement exactly:
Is there any way Postgres can just use
group_id
index after thematch_id
result set is computed?
A multicolumn index is sorted by leading expressions first. Using this index, Postgres excludes the lion share of irrelevant matches on group_id
for your query. Should be particularly impactful for your case since match_id
seems to be much more selective.
See:
This index also covers queries on just match_id
like you mentioned in your comment. Since match_id
is the more selective and bigger column (uuid
= 16 bytes, int = 4 bytes), the multicolumn index isn't much bigger than the one on just (match_id)
, and should work almost as well for the purpose, making the overhead for maintaining a second index questionable. The other index on just (group_id)
may still serve its purpose for queries on just group_id
.
See:
Your query can benefit a lot from physically clustering rows on match_id
. Use CLUSTER
or one of the less blocking community tools. See:
Only reasonable if there isn't too much write activity on the table (or large parts of it).
Also, if you don't really need SELECT *
, but only a small subset of columns, there is potential for optimization.