database postgresql database-performance

postgresql 9.6.4: timestamp range query on large table takes forever

I need some help in analyzing the bad performance of a query executed on a large table containing 83.660.142 million rows which takes up to 25 minutes to more than one hour, depending on the system load, for computation.

I've created the following table that consists of a composite key and 3 indexes:

CREATE TABLE IF NOT EXISTS ds1records(
 userid INT DEFAULT 0,
 clientid VARCHAR(255) DEFAULT '',
 ts TIMESTAMP,
 site VARCHAR(50) DEFAULT '',
 code VARCHAR(400) DEFAULT '');

CREATE UNIQUE INDEX IF NOT EXISTS primary_idx ON records (userid, clientid, ts, site, code);
CREATE INDEX IF NOT EXISTS userid_idx ON records (userid);
CREATE INDEX IF NOT EXISTS ts_idx ON records (ts);
CREATE INDEX IF NOT EXISTS userid_ts_idx ON records (userid ASC,ts DESC);

In a spring batch application I'm executing a query that looks as follows:

SELECT * 
  FROM records 
 WHERE userid = ANY(VALUES (2), ..., (96158 more userids) ) 
  AND ( ts < '2017-09-02' AND ts >= '2017-09-01' 
        OR ts < '2017-08-26' AND ts >= '2017-08-25' 
        OR ts < '2017-08-19' AND ts >= '2017-08-18' 
        OR ts < '2017-08-12' AND ts >= '2017-08-11')

The User ID's are determined at runtime (number of id's lie between 95.000 and 110.000). For each user I need to extract the page views of the current day and the last same three weekdays. The query always returns rows between 3-4M rows.

Executing the query with the EXPLAIN ANALYZE option returns the following execution plan.

Nested Loop  (cost=1483.40..1246386.43 rows=3761735 width=70) (actual time=108.856..1465501.596 rows=3643240 loops=1)
   ->  HashAggregate  (cost=1442.38..1444.38 rows=200 width=4) (actual time=33.277..201.819 rows=96159 loops=1)
     Group Key: "*VALUES*".column1
     ->  Values Scan on "*VALUES*"  (cost=0.00..1201.99 rows=96159 width=4) (actual time=0.006..11.599 rows=96159 loops=1)
   ->  Bitmap Heap Scan on records  (cost=41.02..6224.01 rows=70 width=70) (actual time=8.865..15.218 rows=38 loops=96159)
     Recheck Cond: (userid = "*VALUES*".column1)
     Filter: (((ts < '2017-09-02 00:00:00'::timestamp without time zone) AND (ts >= '2017-09-01 00:00:00'::timestamp without time zone)) OR ((ts < '2017-08-26 00:00:00'::timestamp without time zone) AND (ts >= '2017-08-25 00:00:00'::timestamp without time zone)) OR ((ts < '2017-08-19 00:00:00'::timestamp without time zone) AND (ts >= '2017-08-18 00:00:00'::timestamp without time zone)) OR ((ts < '2017-08-12 00:00:00'::timestamp without time zone) AND (ts >= '2017-08-11 00:00:00'::timestamp without time zone)))
     Rows Removed by Filter: 792
     Heap Blocks: exact=77251145
     ->  Bitmap Index Scan on userid_ts_idx  (cost=0.00..41.00 rows=1660 width=0) (actual time=6.593..6.593 rows=830 loops=96159)
           Index Cond: (userid = "*VALUES*".column1)

I've adjusted the values of some Postgres tuning parameters (unfortunately with no success):

effective_cache_size=15GB (probably useless as query is executed just once)
shared_buffers=15GB
work_mem=3GB

The application runs computationally expensive tasks (e.g. data fusion/data injection) and consumes roughly 100GB memory, so the system hardware is sufficiently dimensioned with 125GB RAM and 16 cores (OS: Debian).

I'm wondering why postgres is not using the combined index userid_ts_idx in its execution plan? Since the timestamp column in the index is sorted in reverse order I would expect postgres to use this to find matching tuples for the range part of the query as it could sequentially go through the index until the condition ts < '2017-09-02 00:00:00 holds true and return all values until condition ts >= 2017-09-01 00:00:00 is met. Instead postgres uses the expensive Bitmap Heap Scan which does a linear table scan if I understood correctly. Did I misconfigure the db settings or do I have a conceptual misunderstanding?

Update

The CTE as suggested in the comments unfortunately did not bring any improvements. The Bitmap Heap Scan has been replaced by a Sequantial Scan but the performance is still poor. Following is the updated execution plan:

Merge Join  (cost=20564929.37..20575876.60 rows=685277 width=106) (actual time=2218133.229..2222280.192 rows=3907472 loops=1)
  Merge Cond: (ids.id = r.userid)
  Buffers: shared hit=2408684 read=181785
  CTE ids
    ->  Values Scan on "*VALUES*"  (cost=0.00..1289.70 rows=103176 width=4) (actual time=0.002..28.670 rows=103176 loops=1)
  CTE ts
    ->  Values Scan on "*VALUES*_1"  (cost=0.00..0.05 rows=4 width=32) (actual time=0.002..0.004 rows=4 loops=1)
  ->  Sort  (cost=10655.37..10913.31 rows=103176 width=4) (actual time=68.476..83.312 rows=103176 loops=1)
    Sort Key: ids.id
    Sort Method: quicksort  Memory: 7909kB
    ->  CTE Scan on ids  (cost=0.00..2063.52 rows=103176 width=4) (actual time=0.007..47.868 rows=103176 loops=1)
  ->  Sort  (cost=20552984.25..20554773.54 rows=715717 width=102) (actual time=2218059.941..2221230.585 rows=8085760 loops=1)
    Sort Key: r.userid
    Sort Method: quicksort  Memory: 1410084kB
    Buffers: shared hit=2408684 read=181785
    ->  Nested Loop  (cost=0.00..20483384.24 rows=715717 width=102) (actual time=885849.043..2214665.723 rows=8085767 loops=1)
          Join Filter: (ts.r @> r.ts)
          Rows Removed by Join Filter: 707630821
          Buffers: shared hit=2408684 read=181785
          ->  Seq Scan on records r  (cost=0.00..4379760.52 rows=178929152 width=70) (actual time=0.024..645616.135 rows=178929147 loops=1)
                Buffers: shared hit=2408684 read=181785
          ->  CTE Scan on ts  (cost=0.00..0.08 rows=4 width=32) (actual time=0.000..0.000 rows=4 loops=178929147)
Planning time: 126.110 ms
Execution time: 2222514.566 ms

Solution

You should get different plan if you would cast that timestamp to date and filter by value list instead.

CREATE INDEX IF NOT EXISTS userid_ts_idx ON records (userid ASC,cast(ts AS date) DESC);

SELECT *
  FROM records
 WHERE userid = ANY(VALUES (2), ..., (96158 more userids) )
  AND cast(ts AS date) IN('2017-09-01','2017-08-25','2017-08-18','2017-08-11');

Whether it will perform better depends on your data and date range, since I found in my case that Postgres will keep using that index even if date values cover whole table (so a seq scan would be better).

Demo