Postgres behaves radically differently on parameters that don't change the result.
-- This query is fast (~ 40 ms)
WITH filtered_scans AS (
SELECT id FROM scans
WHERE scans."createdAt" BETWEEN NOW() - INTERVAL '26 day' AND NOW() AND
scans."applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'
)
SELECT "scanId", percentile_disc(0.5) WITHIN GROUP (ORDER BY duration) AS p50
FROM scan_exchanges
WHERE "scanId" IN (SELECT id FROM filtered_scans)
GROUP BY "scanId";
-- This one is slow (~ 2 s)
EXPLAIN ANALYZE WITH filtered_scans AS (
SELECT id FROM scans
WHERE scans."createdAt" BETWEEN NOW() - INTERVAL '27 day' AND NOW() AND
-- ⬆️ the only difference is here
scans."applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'
)
SELECT "scanId", percentile_disc(0.5) WITHIN GROUP (ORDER BY duration) AS p50
FROM scan_exchanges
WHERE "scanId" IN (SELECT id FROM filtered_scans)
GROUP BY "scanId";
In both cases, the filtered_scans
table is the same:
id |
---|
2a823963-d019-4066-aab5-fed94b5f05fd |
4684e3c7-bbba-4fb4-909c-fc6d512f5555 |
abf56697-2400-452c-a68a-e243d53bafb0 |
And the resulting table is also the same:
scanId | p50 |
---|---|
2a823963-d019-4066-aab5-fed94b5f05fd | 0,09799999999813735 |
4684e3c7-bbba-4fb4-909c-fc6d512f5555 | 0,07500000001164153 |
abf56697-2400-452c-a68a-e243d53bafb0 | 0,1159999999217689 |
The table scan_exchanges
has an index on scanId
named scan_exchanges_scanId_idx
. Table columns:
CREATE TABLE "scans" (
"id" UUID NOT NULL,
"createdAt" TIMESTAMP NOT NULL DEFAULT 'CURRENT_TIMESTAMP',
"applicationId" UUID NOT NULL,
PRIMARY KEY ("id"),
INDEX "scans_applicationId_idx" ("applicationId")
)
CREATE TABLE "scan_exchanges" (
"id" UUID NOT NULL,
"duration" DOUBLE PRECISION NOT NULL,
"scanId" UUID NOT NULL,
PRIMARY KEY ("id"),
INDEX "scan_exchanges_scanId_idx" ("scanId"),
CONSTRAINT "scan_exchanges_scanId_fkey" FOREIGN KEY ("scanId") REFERENCES "scans" ("id") ON UPDATE CASCADE ON DELETE CASCADE
)
-- I removed some fields irrelevant to the question, tell me if I removed too much
I tried running ANALYZE EXPLAIN
on the queries and Postgres 13.9.0 returned wildly different results:
Fast query:
QUERY PLAN
GroupAggregate (cost=26.03..308595.21 rows=818 width=24) (actual time=24.336..26.808 rows=3 loops=1)
Group Key: scan_exchanges."scanId"
-> Nested Loop (cost=26.03..308330.41 rows=50915 width=24) (actual time=23.299..26.374 rows=1414 loops=1)
-> Index Scan using scans_pkey on scans (cost=0.28..1488.47 rows=71 width=16) (actual time=23.211..24.283 rows=3 loops=1)
Filter: (("applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'::uuid) AND ("createdAt" <= now()) AND ("createdAt" >= (now() - '26 days'::interval)))
Rows Removed by Filter: 1482
-> Bitmap Heap Scan on scan_exchanges (cost=25.75..4308.74 rows=1298 width=24) (actual time=0.054..0.623 rows=471 loops=3)
Recheck Cond: ("scanId" = scans.id)
Heap Blocks: exact=492
-> Bitmap Index Scan on "scan_exchanges_scanId_idx" (cost=0.00..25.43 rows=1298 width=0) (actual time=0.034..0.034 rows=509 loops=3)
Index Cond: ("scanId" = scans.id)
Planning Time: 0.393 ms
JIT:
Functions: 14
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 2.845 ms, Inlining 0.000 ms, Optimization 1.138 ms, Emission 21.245 ms, Total 25.228 ms
Execution Time: 29.853 ms
Slow query:
QUERY PLAN
GroupAggregate (cost=306783.39..307261.53 rows=818 width=24) (actual time=2028.021..2028.437 rows=3 loops=1)
Group Key: scan_exchanges."scanId"
-> Sort (cost=306783.39..306939.36 rows=62389 width=24) (actual time=2027.787..2027.933 rows=1414 loops=1)
Sort Key: scan_exchanges."scanId"
Sort Method: quicksort Memory: 159kB
-> Hash Join (cost=370.11..301814.42 rows=62389 width=24) (actual time=179.534..2027.085 rows=1414 loops=1)
Hash Cond: (scan_exchanges."scanId" = scans.id)
-> Seq Scan on scan_exchanges (cost=0.00..298648.46 rows=1062046 width=24) (actual time=25.341..1843.927 rows=1067166 loops=1)
-> Hash (cost=369.02..369.02 rows=87 width=16) (actual time=0.538..0.540 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Bitmap Heap Scan on scans (cost=10.71..369.02 rows=87 width=16) (actual time=0.100..0.534 rows=3 loops=1)
Recheck Cond: ("applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'::uuid)
Filter: (("createdAt" <= now()) AND ("createdAt" >= (now() - '27 days'::interval)))
Rows Removed by Filter: 319
Heap Blocks: exact=156
-> Bitmap Index Scan on "scans_applicationId_idx" (cost=0.00..10.69 rows=322 width=0) (actual time=0.042..0.042 rows=322 loops=1)
Index Cond: ("applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'::uuid)
Planning Time: 0.689 ms
JIT:
Functions: 19
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 6.831 ms, Inlining 0.000 ms, Optimization 1.793 ms, Emission 23.113 ms, Total 31.737 ms
Execution Time: 2035.574 ms
I'm not experimented enough to understand what made Postgres create two different query plans.
What can I do to have Postgres use the fast query plan, regardless of the filter on scans."createdAt"
?
Edit (29/03/2023):
I tried adding an index over ("applicationId", "createdAt", "id")
but it unfortunately does not make the query more predictable:
Slow query:
QUERY PLAN
GroupAggregate (cost=299278.22..305644.21 rows=818 width=24) (actual time=1122.330..1238.101 rows=3 loops=1)
Group Key: scan_exchanges."scanId"
-> Gather Merge (cost=299278.22..305366.87 rows=53423 width=24) (actual time=1121.841..1237.594 rows=1414 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=298278.21..298356.78 rows=31425 width=24) (actual time=1078.875..1078.947 rows=707 loops=2)
Sort Key: scan_exchanges."scanId"
Sort Method: quicksort Memory: 101kB
Worker 0: Sort Method: quicksort Memory: 59kB
-> Hash Join (cost=10.91..295930.82 rows=31425 width=24) (actual time=132.823..1078.481 rows=707 loops=2)
Hash Cond: (scan_exchanges."scanId" = scans.id)
-> Parallel Seq Scan on scan_exchanges (cost=0.00..294275.33 rows=624733 width=24) (actual time=17.585..986.391 rows=535626 loops=2)
-> Hash (cost=9.97..9.97 rows=75 width=16) (actual time=0.093..0.095 rows=3 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Only Scan using app_created_id on scans (cost=0.29..9.97 rows=75 width=16) (actual time=0.087..0.089 rows=3 loops=2)
Index Cond: (("applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'::uuid) AND ("createdAt" >= (now() - '27 days'::interval)) AND ("createdAt" <= now()))
Heap Fetches: 0
Planning Time: 0.436 ms
JIT:
Functions: 26
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 4.929 ms, Inlining 0.000 ms, Optimization 2.230 ms, Emission 32.063 ms, Total 39.222 ms
Execution Time: 1241.326 ms
(The fast query does not change and does not rely on the newly created index)
In this specific case, it turns out setting random_page_cost
closer to seq_page_cost
makes the query planner more predictable.
SHOW seq_page_cost; -- 1.0
SHOW random_page_cost; -- 4.0
SET random_page_cost = 1.2;
Thank you @LaurenzAlbe for the tip and all others who helped
See also: