I am having these results for analyze for a simple query that does not return more than 150 records from tables less than 200 records most of them, as I have a table that stores latest value and the other fields are FK of the data.
Update: see the new results from same query some our later. The site is not public and/or there should be not users right now as it is in development.
explain analyze
SELECT lv.station_id,
s.name AS station_name,
s.latitude,
s.longitude,
s.elevation,
lv.element_id,
e.symbol AS element_symbol,
u.symbol,
e.name AS element_name,
lv.last_datetime AS datetime,
lv.last_value AS valor,
s.basin_id,
s.municipality_id
FROM (((element_station lv /*350 records*/
JOIN stations s ON ((lv.station_id = s.id))) /*40 records*/
JOIN elements e ON ((lv.element_id = e.id))) /*103 records*/
JOIN units u ON ((e.unit_id = u.id))) /* 32 records */
WHERE s.id = lv.station_id AND e.id = lv.element_id AND lv.interval_id = 6 and
lv.last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)
I have already tried VACUUM
and after that some is saved, but again after some times it goes up. I have implemented an index on the fields.
Nested Loop (cost=0.29..2654.66 rows=1 width=92) (actual time=1219.390..35296.253 rows=157 loops=1)
Join Filter: (e.unit_id = u.id)
Rows Removed by Join Filter: 4867
-> Nested Loop (cost=0.29..2652.93 rows=1 width=92) (actual time=1219.383..35294.083 rows=157 loops=1)
Join Filter: (lv.element_id = e.id)
Rows Removed by Join Filter: 16014
-> Nested Loop (cost=0.29..2648.62 rows=1 width=61) (actual time=1219.301..35132.373 rows=157 loops=1)
-> Seq Scan on element_station lv (cost=0.00..2640.30 rows=1 width=20) (actual time=1219.248..1385.517 rows=157 loops=1)
Filter: ((interval_id = 6) AND (last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)))
Rows Removed by Filter: 168
-> Index Scan using stations_pkey on stations s (cost=0.29..8.31 rows=1 width=45) (actual time=3.471..214.941 rows=1 loops=157)
Index Cond: (id = lv.station_id)
-> Seq Scan on elements e (cost=0.00..3.03 rows=103 width=35) (actual time=0.003..0.999 rows=103 loops=157)
-> Seq Scan on units u (cost=0.00..1.32 rows=32 width=8) (actual time=0.002..0.005 rows=32 loops=157)
Planning time: 8.312 ms
Execution time: 35296.427 ms
update, same query running it tonight; no changes:
Sort (cost=601.74..601.88 rows=55 width=92) (actual time=1.822..1.841 rows=172 loops=1)
Sort Key: lv.last_datetime DESC
Sort Method: quicksort Memory: 52kB
-> Nested Loop (cost=11.60..600.15 rows=55 width=92) (actual time=0.287..1.680 rows=172 loops=1)
-> Hash Join (cost=11.31..248.15 rows=55 width=51) (actual time=0.263..0.616 rows=172 loops=1)
Hash Cond: (e.unit_id = u.id)
-> Hash Join (cost=9.59..245.60 rows=75 width=51) (actual time=0.225..0.528 rows=172 loops=1)
Hash Cond: (lv.element_id = e.id)
-> Bitmap Heap Scan on element_station lv (cost=5.27..240.25 rows=75 width=20) (actual time=0.150..0.359 rows=172 loops=1)
Recheck Cond: ((last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)) AND (interval_id = 6))
Heap Blocks: exact=22
-> Bitmap Index Scan on element_station_latest (cost=0.00..5.25 rows=75 width=0) (actual time=0.136..0.136 rows=226 loops=1)
Index Cond: ((last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)) AND (interval_id = 6))
-> Hash (cost=3.03..3.03 rows=103 width=35) (actual time=0.062..0.062 rows=103 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 15kB
-> Seq Scan on elements e (cost=0.00..3.03 rows=103 width=35) (actual time=0.006..0.031 rows=103 loops=1)
-> Hash (cost=1.32..1.32 rows=32 width=8) (actual time=0.019..0.019 rows=32 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on units u (cost=0.00..1.32 rows=32 width=8) (actual time=0.003..0.005 rows=32 loops=1)
-> Index Scan using stations_pkey on stations s (cost=0.29..6.39 rows=1 width=45) (actual time=0.005..0.006 rows=1 loops=172)
Index Cond: (id = lv.station_id)
Planning time: 2.390 ms
Execution time: 2.009 ms
The problem is the misestimate of the number of rows in the sequential scan on element_station
. Either autoanalyze has kicked in and calculated new statistics for the table or the data changed.
The problem is probably that PostgreSQL doesn't know the result of
((now() - '06:00:00'::interval) - '01:00:00'::interval)
at query planning time.
If that is possible for you, do it in two steps: First, calculate the expression above (either in PostgreSQL or on the client side). Then run the query with the result as a constant. That will make it easier for PostgreSQL to estimate the result count.