Every time I run an EXPLAIN ANALYZE
over a query in PostgreSQL, Execution time
decreases. Why?
I need to do some indexing on the table and this way I can't be sure if my actions which will enhance performance. What do you recommend me?
Example of the result of successive execution of the query explain (analyze, buffers) :
my_db=# explain (analyze, buffers)
SELECT COUNT(*) AS count
FROM my_view
WHERE creation_time >= '2019-01-18 00:00:00'
AND creation_time <= '2019-01-18 09:43:36'
ORDER BY count DESC
LIMIT 10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------
Limit (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=24380.613..24380.614 rows=1 loops=1)
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Sort (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=24380.611..24380.611 rows=1 loops=1)
Sort Key: (count(*)) DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Aggregate (cost=2568854.21..2568854.22 rows=1 width=8) (actual time=24380.599..24380.599 rows=1 loops=1)
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> GroupAggregate (cost=2568854.18..2568854.20 rows=1 width=28) (actual time=24339.455..24380.589 rows=40 loops=1)
Group Key: my_table.creation_time, my_table.some_field
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Sort (cost=2568854.18..2568854.18 rows=1 width=12) (actual time=24338.361..24357.171 rows=199309 loops=1)
Sort Key: my_table.creation_time, my_table.some_field
Sort Method: external merge Disk: 4496kB
Buffers: shared hit=14915 read=2052993, temp read=562 written=563
-> Gather (cost=1000.00..2568854.17 rows=1 width=12) (actual time=23799.237..24142.217 rows=199309 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=14915 read=2052993
-> Parallel Seq Scan on my_table (cost=0.00..2567854.07 rows=1 width=12) (actual time=23796.695..24087.574 rows=66436 loops=3)
Filter: (creation_time >= '2019-01-18 00:00:00+00'::timestamp with time zone) AND (creation_time <= '2019-01-18 09:43:36+00'::timestamp with time zone)
Rows Removed by Filter: 21818095
Buffers: shared hit=14915 read=2052993
Planning time: 10.982 ms
Execution time: 24381.544 ms
(25 rows)
my_db=# explain (analyze, buffers)
SELECT COUNT(*) AS count
FROM my_view
WHERE creation_time >= '2019-01-18 00:00:00'
AND creation_time <= '2019-01-18 09:43:36'
ORDER BY count DESC
LIMIT 10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------
Limit (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=6836.247..6836.248 rows=1 loops=1)
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Sort (cost=2568854.23..2568854.23 rows=1 width=8) (actual time=6836.245..6836.246 rows=1 loops=1)
Sort Key: (count(*)) DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Aggregate (cost=2568854.21..2568854.22 rows=1 width=8) (actual time=6836.232..6836.232 rows=1 loops=1)
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> GroupAggregate (cost=2568854.18..2568854.20 rows=1 width=28) (actual time=6792.036..6836.221 rows=40 loops=1)
Group Key: my_table.creation_time, my_table.some_field
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Sort (cost=2568854.18..2568854.18 rows=1 width=12) (actual time=6790.807..6811.469 rows=199309 loops=1)
Sort Key: my_table.creation_time, my_table.some_field
Sort Method: external merge Disk: 4496kB
Buffers: shared hit=15181 read=2052727, temp read=562 written=563
-> Gather (cost=1000.00..2568854.17 rows=1 width=12) (actual time=6271.571..6604.946 rows=199309 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=15181 read=2052727
-> Parallel Seq Scan on my_table (cost=0.00..2567854.07 rows=1 width=12) (actual time=6268.383..6529.416 rows=66436 loops=3)
Filter: (creation_time >= '2019-01-18 00:00:00+00'::timestamp with time zone) AND (creation_time <= '2019-01-18 09:43:36+00'::timestamp with time zone)
Rows Removed by Filter: 21818095
Buffers: shared hit=15181 read=2052727
Planning time: 0.570 ms
Execution time: 6837.137 ms
(25 rows)
Thank you.
What version of PostgreSQL is this?
There are two problems:
A missing index:
CREATE INDEX ON my_table (creation_time);
The value distribution statistics for creation
time are way off, which causes PostgreSQL to underestimate the number of result rows terribly.
This is not the immediate cause of your problem, but you should investigate what is going on there:
If ANALYZE my_table
improves the estimate, see that autoanalyze runs more often for that table.
If ANALYZE my_table
does not help, collect better statistics:
ALTER TABLE my_table ALTER creation_time SET STATISTICS 1000;
ANALYZE my_table;
The reduced execution time you observe with EXPLAIN (ANALYZE)
is probably due to caching effects.