I am trying to debug a query that runs faster the more records it returns but performance severely degrades (>10x slower) with smaller returns (i.e. <10 rows) using small LIMIT
(ie 10).
Example:
Fast query with 5 results out of 1M rows - no LIMIT
SELECT *
FROM transaction_internal_by_addresses
WHERE address = 'foo'
ORDER BY block_number desc;
Explain:
Sort (cost=7733.14..7749.31 rows=6468 width=126) (actual time=0.030..0.031 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Sort Key: transaction_internal_by_addresses.block_number
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=10
-> Index Scan using transaction_internal_by_addresses_pkey on public.transaction_internal_by_addresses (cost=0.69..7323.75 rows=6468 width=126) (actual time=0.018..0.021 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Index Cond: (transaction_internal_by_addresses.address = 'foo'::text)
Buffers: shared hit=10
Query Identifier: -8912211611755432198
Planning Time: 0.051 ms
Execution Time: 0.041 ms
Fast query with 5 results out of 1M rows: - High LIMIT
SELECT *
FROM transaction_internal_by_addresses
WHERE address = 'foo'
ORDER BY block_number desc
LIMIT 100;
Limit (cost=7570.95..7571.20 rows=100 width=126) (actual time=0.024..0.025 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Buffers: shared hit=10
-> Sort (cost=7570.95..7587.12 rows=6468 width=126) (actual time=0.023..0.024 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Sort Key: transaction_internal_by_addresses.block_number DESC
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=10
-> Index Scan using transaction_internal_by_addresses_pkey on public.transaction_internal_by_addresses (cost=0.69..7323.75 rows=6468 width=126) (actual time=0.016..0.020 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Index Cond: (transaction_internal_by_addresses.address = 'foo'::text)
Buffers: shared hit=10
Query Identifier: 3421253327669991203
Planning Time: 0.042 ms
Execution Time: 0.034 ms
Slow query: - Low LIMIT
SELECT *
FROM transaction_internal_by_addresses
WHERE address = 'foo'
ORDER BY block_number desc
LIMIT 10;
Explain result:
Limit (cost=1000.63..6133.94 rows=10 width=126) (actual time=10277.845..11861.269 rows=0 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Buffers: shared hit=56313576
-> Gather Merge (cost=1000.63..3333036.90 rows=6491 width=126) (actual time=10277.844..11861.266 rows=0 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=56313576
-> Parallel Index Scan Backward using transaction_internal_by_address_idx_block_number on public.transaction_internal_by_addresses (cost=0.57..3331263.70 rows=1623 width=126) (actual time=10256.995..10256.995 rows=0 loops=5)
" Output: address, block_number, log_index, transaction_hash"
Filter: (transaction_internal_by_addresses.address = 'foo'::text)
Rows Removed by Filter: 18485480
Buffers: shared hit=56313576
Worker 0: actual time=10251.822..10251.823 rows=0 loops=1
Buffers: shared hit=11387166
Worker 1: actual time=10250.971..10250.972 rows=0 loops=1
Buffers: shared hit=10215941
Worker 2: actual time=10252.269..10252.269 rows=0 loops=1
Buffers: shared hit=10191990
Worker 3: actual time=10252.513..10252.514 rows=0 loops=1
Buffers: shared hit=10238279
Query Identifier: 2050754902087402293
Planning Time: 0.081 ms
Execution Time: 11861.297 ms
DDL
create table transaction_internal_by_addresses
(
address text not null,
block_number bigint,
log_index bigint not null,
transaction_hash text not null,
primary key (address, log_index, transaction_hash)
);
alter table transaction_internal_by_addresses
owner to "icon-worker";
create index transaction_internal_by_address_idx_block_number
on transaction_internal_by_addresses (block_number);
address
(primary key)?block_number
is scanned in the slow query but I am not sure why. Can anyone explain?A multicolumn BTREE index on (address, block_number DESC)
is exactly what the query planner needs to generate the result sets you mentioned. It will random-access the index to the first eligible row, then read the rows out in sequential order until it hits the LIMIT. You can also omit the DESC with no ill effects.
create index address_block_number
on transaction_internal_by_addresses
(address, block_number DESC);
As for asking "why" about query planner results, that's often an enduring mystery.