Search code examples
sqlpostgresqlquery-optimizationsql-viewpostgresql-10

How do I optimize a view for querying a ledger-like table in PostgreSQL?


I have a ledger-like table in a postgresql DB that tracks events for film comparisons on a site where users can add/remove film comparisons. Each event is recorded as a timestamped row in the event database. Rows are never removed:

CREATE TABLE events
(
    date              timestamp with time zone NOT NULL,
    parent_film_id    varchar(8)               NOT NULL,
    comp_film_id      varchar(8)               NOT NULL,
    event_type        varchar(20)              NOT NULL
);

I'd like to create a view that represents the current existing comparisons for a given film_id. What's important is that the if a film A is a comparison for B, then film B is also a comparison for film A.

I have tried to create the view in the following way:

CREATE OR REPLACE VIEW comps AS (
  WITH bidirectional_events AS (
    (SELECT DISTINCT ON (e.comp_film_id, e.parent_film_id)
                     e.date,
                     e.parent_film_id AS comp_film_id,
                     e.comp_film_id AS parent_film_id,
                     e.event_type
                FROM events AS e
               WHERE e.event_type = 'create'
                  OR e.event_type = 'remove'
            ORDER BY e.comp_film_id, e.parent_film_id, date DESC)
    UNION
    (SELECT DISTINCT ON (parent_film_id, comp_film_id)
                     *
                FROM events 
               WHERE event_type = 'create'
                  OR event_type = 'remove'
            ORDER BY parent_film_id, comp_film_id, date DESC))
  SELECT date,
         comp_film_id,
         parent_film_id, 
    FROM bidirectional_events
   WHERE event_type = 'create');

However, it takes several hundred ms to query for the comps for a single ID in this view. Much slower than what it would take to just query for all of the events matching a single film (single digit ms).

What can I do to speed the query up?

I've added the following indices to the events table, but they've not massively changed the query time of the view:

Indexes:
    "comp_idx" btree (comp_film_id)
    "comp_parent_date_desc_idx" btree (comp_film_id, parent_film_id, date DESC)
    "event_idx" btree (event_type)
    "parent_comp_date_desc_idx" btree (parent_film_id, comp_film_id, date DESC)
    "parent_idx" btree (parent_film_id)

The output of running EXPLAIN on the view for a query for a specific film with id 99196 is:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                                                                                                                          |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CTE Scan on bidirectional_events  (cost=33198.98..34340.43 rows=1 width=168) (actual time=1233.483..1640.063 rows=33 loops=1)                                                                                                                       |
|   Output: bidirectional_events.date, bidirectional_events.comp_film_id, bidirectional_events.parent_film_id, bidirectional_events.territory_id, bidirectional_events.company_id, bidirectional_events.user_id, bidirectional_events.source_id       |
|   Filter: (((bidirectional_events.event_type)::text = 'create'::text) AND ((bidirectional_events.comp_film_id)::text = '99196'::text))                                                                                                           |
|   Rows Removed by Filter: 117790                                                                                                                                                                                                                    |
|   Buffers: shared hit=2571, temp read=1670 written=2491                                                                                                                                                                                             |
|   CTE bidirectional_events                                                                                                                                                                                                                          |
|     ->  Unique  (cost=32171.68..33198.98 rows=45658 width=226) (actual time=1227.012..1526.222 rows=117823 loops=1)                                                                                                                                 |
|           Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id                                                                                                                      |
|           Buffers: shared hit=2571, temp read=1670 written=1674                                                                                                                                                                                     |
|           ->  Sort  (cost=32171.68..32285.82 rows=45658 width=226) (actual time=1227.009..1328.931 rows=117838 loops=1)                                                                                                                             |
|                 Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id                                                                                                                |
|                 Sort Key: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id                                                                                                              |
|                 Sort Method: external merge  Disk: 6568kB                                                                                                                                                                                           |
|                 Buffers: shared hit=2571, temp read=1670 written=1674                                                                                                                                                                               |
|                 ->  Append  (cost=11140.74..23643.57 rows=45658 width=226) (actual time=298.843..1076.515 rows=117838 loops=1)                                                                                                                      |
|                       Buffers: shared hit=2562, temp read=849 written=851                                                                                                                                                                           |
|                       ->  Unique  (cost=11140.74..11593.50 rows=22829 width=41) (actual time=298.841..447.298 rows=58919 loops=1)                                                                                                                   |
|                             Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id                                                                                                    |
|                             Buffers: shared hit=1281, temp read=424 written=425                                                                                                                                                                     |
|                             ->  Sort  (cost=11140.74..11291.66 rows=60367 width=41) (actual time=298.838..354.656 rows=60875 loops=1)                                                                                                               |
|                                   Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id                                                                                              |
|                                   Sort Key: e.comp_film_id, e.parent_film_id, e.date DESC                                                                                                                                                           |
|                                   Sort Method: external merge  Disk: 3392kB                                                                                                                                                                         |
|                                   Buffers: shared hit=1281, temp read=424 written=425                                                                                                                                                               |
|                                   ->  Bitmap Heap Scan on public.events e  (cost=1317.57..4488.66 rows=60367 width=41) (actual time=3.593..55.910 rows=60875 loops=1)                                                                               |
|                                         Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id                                                                                        |
|                                         Recheck Cond: (((e.event_type)::text = 'create'::text) OR ((e.event_type)::text = 'remove'::text))                                                                                                    |
|                                         Heap Blocks: exact=1039                                                                                                                                                                                     |
|                                         Buffers: shared hit=1281
|                                         ->  BitmapOr  (cost=1317.57..1317.57 rows=60606 width=0) (actual time=3.457..3.461 rows=0 loops=1)
|                                               Buffers: shared hit=242
|                                               ->  Bitmap Index Scan on event_idx  (cost=0.00..1263.68 rows=59635 width=0) (actual time=3.346..3.347 rows=59059 loops=1)
|                                                     Index Cond: ((e.event_type)::text = 'create'::text)
|                                                     Buffers: shared hit=232
|                                               ->  Bitmap Index Scan on event_idx  (cost=0.00..23.70 rows=971 width=0) (actual time=0.107..0.108 rows=1816 loops=1)
|                                                     Index Cond: ((e.event_type)::text = 'remove'::text)
|                                                     Buffers: shared hit=10
|                       ->  Unique  (cost=11140.74..11593.50 rows=22829 width=41) (actual time=320.770..462.587 rows=58919 loops=1)
|                             Output: events.date, events.comp_film_id, events.parent_film_id, events.territory_id, events.company_id, events.user_id, events.event_type, events.source_id
|                             Buffers: shared hit=1281, temp read=425 written=426
|                             ->  Sort  (cost=11140.74..11291.66 rows=60367 width=41) (actual time=320.767..372.770 rows=60875 loops=1)
|                                   Output: events.date, events.comp_film_id, events.parent_film_id, events.territory_id, events.company_id, events.user_id, events.event_type, events.source_id
|                                   Sort Key: events.parent_film_id, events.comp_film_id, events.date DESC
|                                   Sort Method: external merge  Disk: 3400kB
|                                   Buffers: shared hit=1281, temp read=425 written=426
|                                   ->  Bitmap Heap Scan on public.events  (cost=1317.57..4488.66 rows=60367 width=41) (actual time=3.279..50.067 rows=60875 loops=1)
|                                         Output: events.date, events.comp_film_id, events.parent_film_id, events.territory_id, events.company_id, events.user_id, events.event_type, events.source_id
|                                         Recheck Cond: (((events.event_type)::text = 'create'::text) OR ((events.event_type)::text = 'remove'::text))
|                                         Heap Blocks: exact=1039
|                                         Buffers: shared hit=1281
|                                         ->  BitmapOr  (cost=1317.57..1317.57 rows=60606 width=0) (actual time=3.156..3.160 rows=0 loops=1)
|                                               Buffers: shared hit=242
|                                               ->  Bitmap Index Scan on event_idx  (cost=0.00..1263.68 rows=59635 width=0) (actual time=3.044..3.045 rows=59059 loops=1)
|                                                     Index Cond: ((events.event_type)::text = 'create'::text)
|                                                     Buffers: shared hit=232
|                                               ->  Bitmap Index Scan on event_idx  (cost=0.00..23.70 rows=971 width=0) (actual time=0.108..0.108 rows=1816 loops=1)
|                                                     Index Cond: ((events.event_type)::text = 'remove'::text)
|                                                     Buffers: shared hit=10
| Planning time: 0.885 ms
| Execution time: 1644.445 ms
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Solution

  • The solution to my particular problem was to not use a CTE as part of the view query. PostgreSQL 12 and onwards don't materialize CTE's defined in a WITH statement as long as they are only referenced once 1, but unfortunately I was running my query on PostgreSQL 10.

    To solve the issue, I changed the view query so I wasn't using a WITH statement. This allowed the query optimiser to run a single select rather than first materializing the entire view for every query:

    CREATE OR REPLACE VIEW comps AS (
        SELECT DISTINCT ON (parent_film_id, comp_film_id) date, comp_film_id, parent_film_id FROM (
          (SELECT DISTINCT ON (e.comp_film_id, e.parent_film_id)
                           e.date,
                           e.parent_film_id AS comp_film_id,
                           e.comp_film_id AS parent_film_id,
                           e.event_type
                      FROM events AS e
                     WHERE e.event_type IN ('create', 'remove')
    
                  ORDER BY e.comp_film_id, e.parent_film_id, date DESC)
          UNION
          (SELECT DISTINCT ON (parent_film_id, comp_film_id)
                           *
                      FROM events
                     WHERE event_type IN ('create', 'v2_remove')
    
                  ORDER BY parent_film_id, comp_film_id, date DESC)) AS bidirectional
         WHERE event_type = 'create'
      ORDER BY parent_film_id, comp_film_id, date);