Search code examples
sqlpostgresqlwhere-in

Slow WHERE IN End of Query


Sample data from my table test_table:

date           symbol      value      created_time
2010-01-09     symbol1     101        3847474847
2010-01-10     symbol1     102        3847474847
2010-01-10     symbol1     102.5      3847475500
2010-01-10     symbol2     204        3847474847
2010-01-11     symbol1     109        3847474847
2010-01-12     symbol1     105        3847474847
2010-01-12     symbol2     206        3847474847

Below is the query I am currently using on a table with about 3k unique symbols and about 11 million rows. It is still taking a while (over 80% of the query time is spent on the Subquery Scan at the end (which I think is the WHERE IN clause at the end of the query. Is there any way to speed this part up (imagine I have a lot of symbols in that WHERE IN lookup).

select date, symbol, value, created_time
from (select *,
    rank() over (partition by date, symbol order by created_time desc) as rownum
  from test_table) x
where rownum = 1 and symbol in ('symbol1', 'symbol2', 'symbol5', ...)

Below is an EXPLAIN ANALYZE output if that helps.

QUERY PLAN
Subquery Scan on x  (cost=281573.35..282473.76 rows=129 width=37) (actual time=2874.389..3037.008 rows=32393 loops=1)
  Filter: (x.rownum = 1)
  Rows Removed by Filter: 183
  ->  WindowAgg  (cost=281573.35..282152.19 rows=25726 width=37) (actual time=2874.363..2980.848 rows=32576 loops=1)
        ->  Sort  (cost=281573.35..281637.67 rows=25726 width=37) (actual time=2874.340..2901.443 rows=32576 loops=1)
              Sort Key: "test_table".date, "test_table".symbol, "test_table".created_time DESC
              Sort Method: quicksort  Memory: 3314kB
              ->  Seq Scan on "test_table"  (cost=0.00..279688.80 rows=25726 width=37) (actual time=118.338..2693.767 rows=32576 loops=1)
                    Filter: (symbol = ANY ('{symbol5,symbol8,symbol15,symbol98,symbol43,symbol908}'::text[]))
                    Rows Removed by Filter: 10649132
Planning time: 0.999 ms
Execution time: 3064.496 ms

Solution

  • Try using distinct on instead:

    select distinct on (symbol, date) date, symbol, value, created_time
    from test_table
    where symbol in ('symbol1', 'symbol2', 'symbol5', ...)
    order by symbol, date, created_time desc;
    

    And for this query, you want an index on test_table(symbol, date, created_time desc).