I have a ledger-like table in a postgresql DB that tracks events for film comparisons on a site where users can add/remove film comparisons. Each event is recorded as a timestamped row in the event
database. Rows are never removed:
date timestamp with time zone NOT NULL,
parent_film_id varchar(8) NOT NULL,
comp_film_id varchar(8) NOT NULL,
event_type varchar(20) NOT NULL
I'd like to create a view that represents the current existing comparisons for a given film_id
. What's important is that the if a film A is a comparison for B, then film B is also a comparison for film A.
I have tried to create the view in the following way:
WITH bidirectional_events AS (
(SELECT DISTINCT ON (e.comp_film_id, e.parent_film_id)
e.parent_film_id AS comp_film_id,
e.comp_film_id AS parent_film_id,
FROM events AS e
WHERE e.event_type = 'create'
OR e.event_type = 'remove'
ORDER BY e.comp_film_id, e.parent_film_id, date DESC)
(SELECT DISTINCT ON (parent_film_id, comp_film_id)
FROM events
WHERE event_type = 'create'
OR event_type = 'remove'
ORDER BY parent_film_id, comp_film_id, date DESC))
SELECT date,
FROM bidirectional_events
WHERE event_type = 'create');
However, it takes several hundred ms to query for the comps for a single ID in this view. Much slower than what it would take to just query for all of the events matching a single film (single digit ms).
What can I do to speed the query up?
I've added the following indices to the events
table, but they've not massively changed the query time of the view:
"comp_idx" btree (comp_film_id)
"comp_parent_date_desc_idx" btree (comp_film_id, parent_film_id, date DESC)
"event_idx" btree (event_type)
"parent_comp_date_desc_idx" btree (parent_film_id, comp_film_id, date DESC)
"parent_idx" btree (parent_film_id)
The output of running EXPLAIN on the view for a query for a specific film with id 99196
| CTE Scan on bidirectional_events (cost=33198.98..34340.43 rows=1 width=168) (actual time=1233.483..1640.063 rows=33 loops=1) |
| Output: bidirectional_events.date, bidirectional_events.comp_film_id, bidirectional_events.parent_film_id, bidirectional_events.territory_id, bidirectional_events.company_id, bidirectional_events.user_id, bidirectional_events.source_id |
| Filter: (((bidirectional_events.event_type)::text = 'create'::text) AND ((bidirectional_events.comp_film_id)::text = '99196'::text)) |
| Rows Removed by Filter: 117790 |
| Buffers: shared hit=2571, temp read=1670 written=2491 |
| CTE bidirectional_events |
| -> Unique (cost=32171.68..33198.98 rows=45658 width=226) (actual time=1227.012..1526.222 rows=117823 loops=1) |
| Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id |
| Buffers: shared hit=2571, temp read=1670 written=1674 |
| -> Sort (cost=32171.68..32285.82 rows=45658 width=226) (actual time=1227.009..1328.931 rows=117838 loops=1) |
| Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id |
| Sort Key: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id |
| Sort Method: external merge Disk: 6568kB |
| Buffers: shared hit=2571, temp read=1670 written=1674 |
| -> Append (cost=11140.74..23643.57 rows=45658 width=226) (actual time=298.843..1076.515 rows=117838 loops=1) |
| Buffers: shared hit=2562, temp read=849 written=851 |
| -> Unique (cost=11140.74..11593.50 rows=22829 width=41) (actual time=298.841..447.298 rows=58919 loops=1) |
| Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id |
| Buffers: shared hit=1281, temp read=424 written=425 |
| -> Sort (cost=11140.74..11291.66 rows=60367 width=41) (actual time=298.838..354.656 rows=60875 loops=1) |
| Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id |
| Sort Key: e.comp_film_id, e.parent_film_id, e.date DESC |
| Sort Method: external merge Disk: 3392kB |
| Buffers: shared hit=1281, temp read=424 written=425 |
| -> Bitmap Heap Scan on public.events e (cost=1317.57..4488.66 rows=60367 width=41) (actual time=3.593..55.910 rows=60875 loops=1) |
| Output: e.date, e.parent_film_id, e.comp_film_id, e.territory_id, e.company_id, e.user_id, e.event_type, e.source_id |
| Recheck Cond: (((e.event_type)::text = 'create'::text) OR ((e.event_type)::text = 'remove'::text)) |
| Heap Blocks: exact=1039 |
| Buffers: shared hit=1281
| -> BitmapOr (cost=1317.57..1317.57 rows=60606 width=0) (actual time=3.457..3.461 rows=0 loops=1)
| Buffers: shared hit=242
| -> Bitmap Index Scan on event_idx (cost=0.00..1263.68 rows=59635 width=0) (actual time=3.346..3.347 rows=59059 loops=1)
| Index Cond: ((e.event_type)::text = 'create'::text)
| Buffers: shared hit=232
| -> Bitmap Index Scan on event_idx (cost=0.00..23.70 rows=971 width=0) (actual time=0.107..0.108 rows=1816 loops=1)
| Index Cond: ((e.event_type)::text = 'remove'::text)
| Buffers: shared hit=10
| -> Unique (cost=11140.74..11593.50 rows=22829 width=41) (actual time=320.770..462.587 rows=58919 loops=1)
| Output: events.date, events.comp_film_id, events.parent_film_id, events.territory_id, events.company_id, events.user_id, events.event_type, events.source_id
| Buffers: shared hit=1281, temp read=425 written=426
| -> Sort (cost=11140.74..11291.66 rows=60367 width=41) (actual time=320.767..372.770 rows=60875 loops=1)
| Output: events.date, events.comp_film_id, events.parent_film_id, events.territory_id, events.company_id, events.user_id, events.event_type, events.source_id
| Sort Key: events.parent_film_id, events.comp_film_id, events.date DESC
| Sort Method: external merge Disk: 3400kB
| Buffers: shared hit=1281, temp read=425 written=426
| -> Bitmap Heap Scan on public.events (cost=1317.57..4488.66 rows=60367 width=41) (actual time=3.279..50.067 rows=60875 loops=1)
| Output: events.date, events.comp_film_id, events.parent_film_id, events.territory_id, events.company_id, events.user_id, events.event_type, events.source_id
| Recheck Cond: (((events.event_type)::text = 'create'::text) OR ((events.event_type)::text = 'remove'::text))
| Heap Blocks: exact=1039
| Buffers: shared hit=1281
| -> BitmapOr (cost=1317.57..1317.57 rows=60606 width=0) (actual time=3.156..3.160 rows=0 loops=1)
| Buffers: shared hit=242
| -> Bitmap Index Scan on event_idx (cost=0.00..1263.68 rows=59635 width=0) (actual time=3.044..3.045 rows=59059 loops=1)
| Index Cond: ((events.event_type)::text = 'create'::text)
| Buffers: shared hit=232
| -> Bitmap Index Scan on event_idx (cost=0.00..23.70 rows=971 width=0) (actual time=0.108..0.108 rows=1816 loops=1)
| Index Cond: ((events.event_type)::text = 'remove'::text)
| Buffers: shared hit=10
| Planning time: 0.885 ms
| Execution time: 1644.445 ms
The solution to my particular problem was to not use a CTE as part of the view query. PostgreSQL 12 and onwards don't materialize CTE's defined in a WITH
statement as long as they are only referenced once 1, but unfortunately I was running my query on PostgreSQL 10.
To solve the issue, I changed the view query so I wasn't using a WITH
statement. This allowed the query optimiser to run a single select rather than first materializing the entire view for every query:
SELECT DISTINCT ON (parent_film_id, comp_film_id) date, comp_film_id, parent_film_id FROM (
(SELECT DISTINCT ON (e.comp_film_id, e.parent_film_id)
e.parent_film_id AS comp_film_id,
e.comp_film_id AS parent_film_id,
FROM events AS e
WHERE e.event_type IN ('create', 'remove')
ORDER BY e.comp_film_id, e.parent_film_id, date DESC)
(SELECT DISTINCT ON (parent_film_id, comp_film_id)
FROM events
WHERE event_type IN ('create', 'v2_remove')
ORDER BY parent_film_id, comp_film_id, date DESC)) AS bidirectional
WHERE event_type = 'create'
ORDER BY parent_film_id, comp_film_id, date);