Search code examples
sqlpostgresqlquery-optimization

How to optimize this "select count" SQL? (postgres array comparision)


There is a table, has 10 million records, and it has a column which type is array, it looks like:

id |  content |  contained_special_ids 
----------------------------------------
1  |  abc     |  { 1, 2 }
2  |  abd     |  { 1, 3 }
3  |  abe     |  { 1, 4 }
4  |  abf     |  { 3 }
5  |  abg     |  { 2 }
6  |  abh     |  { 3 }

and I want to know that how many records there is which contained_special_ids includes 3, so my sql is:

select count(*) from my_table where contained_special_ids @> array[3]

It works fine when data is small, however it takes long time (about 30+ seconds) when the table has 10 million records.

I have added index to this column:

"index_my_table_on_contained_special_ids" gin (contained_special_ids)

So, how to optimize this select count query?

Thanks a lot!

UPDATE

below is the explain:


Finalize Aggregate  (cost=1049019.17..1049019.18 rows=1 width=8) (actual time=44343.230..44362.224 rows=1 loops=1)
  Output: count(*)
  ->  Gather  (cost=1049018.95..1049019.16 rows=2 width=8) (actual time=44340.332..44362.217 rows=3 loops=1)
        Output: (PARTIAL count(*))
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=1048018.95..1048018.96 rows=1 width=8) (actual time=44337.615..44337.615 rows=1 loops=3)
              Output: PARTIAL count(*)
              Worker 0:  actual time=44336.442..44336.442 rows=1 loops=1
              Worker 1:  actual time=44336.564..44336.564 rows=1 loops=1
              ->  Parallel Bitmap Heap Scan on public.my_table  (cost=9116.31..1046912.22 rows=442694 width=0) (actual time=330.602..44304.221 rows=391431 loops=3)
                    Recheck Cond: (my_table.contained_special_ids @> '{12511}'::bigint[])
                    Rows Removed by Index Recheck: 501077
                    Heap Blocks: exact=67496 lossy=109789
                    Worker 0:  actual time=329.547..44301.513 rows=409272 loops=1
                    Worker 1:  actual time=329.794..44304.582 rows=378538 loops=1
                    ->  Bitmap Index Scan on index_my_table_on_contained_special_ids  (cost=0.00..8850.69 rows=1062465 width=0) (actual time=278.413..278.414 rows=1176563 loops=1)
                          Index Cond: (my_table.contained_special_ids @> '{12511}'::bigint[])
Planning Time: 1.041 ms
Execution Time: 44362.262 ms


Solution

  • Increase work_mem until the lossy blocks go away. Also, make sure the table is well vacuumed to support index-only bitmap scans, and that you are using a new enough version (which you should tell us) to support those. Finally, you can try increasing effective_io_concurrency.

    Also, post plans as text, not images; and turn on track_io_timing.