I have 2 tables. Their structure is roughly as follows; I've changed the names, though.
CREATE TABLE overlay_polygon
(
overlay_polygon_id SERIAL PRIMARY KEY,
some_other_polygon_id INTEGER REFERENCES some_other_polygon (some_other_polygon_id)
dollar_value NUMERIC,
geom GEOMETRY(Polygon,26915)
)
CREATE TABLE point
(
point_id SERIAL PRIMARY KEY,
some_other_polygon_id INTEGER REFERENCES some_other_polygon (some_other_polygon_id)
-- A bunch of other fields that this query won't touch
geom GEOMETRY(Point,26915)
)
point
has a spatial index on its geom
column, named spix_point
, and an index on its some_other_polygon_id
column as well.
There are approximately 500,000 rows in point
, and almost all of the rows in point
intersect with some row in overlay_polygon
. Originally, my overlay_polygon
table contained several rows that had very small areas (less than 1 square meter, mostly) and did not spatially intersect any rows from point
. After deleting the small rows that do not intersect any rows in point
, there are 38 rows.
As the name implies, overlay_polygon
is a table of polygons generated as the result of the overlay of polygons from 3 other tables (including some_other_polygon
). In particular, I need to make some computations using dollar_value
and some columns on point
. When I set out to delete the rows that intersected no points to speed up processing in the future, I ended up querying on the COUNT of rows. The most obvious query seemed to be the following.
SELECT op.*, COUNT(point_id) AS num_points
FROM overlay_polygon op
LEFT JOIN point ON op.some_other_polygon_id = point.some_other_polygon_id AND ST_Intersects(op.geom, point.geom)
GROUP BY op.overlay_polygon_id
ORDER BY op.overlay_polygon_id
;
Here is its EXPLAIN (ANALYZE, BUFFERS)
.
GroupAggregate (cost=544.45..545.12 rows=38 width=8049) (actual time=284962.944..540959.914 rows=38 loops=1)
Buffers: shared hit=58694 read=17119, temp read=189483 written=189483
I/O Timings: read=39171.525
-> Sort (cost=544.45..544.55 rows=38 width=8049) (actual time=271754.952..534154.573 rows=415224 loops=1)
Sort Key: op.overlay_polygon_id
Sort Method: external merge Disk: 897016kB
Buffers: shared hit=58694 read=17119, temp read=189483 written=189483
I/O Timings: read=39171.525
-> Nested Loop Left Join (cost=0.00..543.46 rows=38 width=8049) (actual time=0.110..46755.284 rows=415224 loops=1)
Buffers: shared hit=58694 read=17119
I/O Timings: read=39171.525
-> Seq Scan on overlay_polygon op (cost=0.00..11.38 rows=38 width=8045) (actual time=0.043..153.255 rows=38 loops=1)
Buffers: shared hit=1 read=10
I/O Timings: read=152.866
-> Index Scan using spix_point on point (cost=0.00..13.99 rows=1 width=200) (actual time=50.229..1139.868 rows=10927 loops=38)
Index Cond: (op.geom && geom)
Filter: ((op.some_other_polygon_id = some_other_polygon_id) AND _st_intersects(op.geom, geom))
Rows Removed by Filter: 13353
Buffers: shared hit=58693 read=17109
I/O Timings: read=39018.660
Total runtime: 542172.156 ms
However, I found that this query ran much, much more quickly:
SELECT *
FROM overlay_polygon
JOIN (SELECT op.overlay_polygon_id, COUNT(point_id) AS num_points
FROM overlay_polygon op
LEFT JOIN point ON op.some_other_polygon_id = point.some_other_polygon_id AND ST_Intersects(op.geom, point.geom)
GROUP BY op.overlay_polygon_id
) x ON x.overlay_polygon_id = overlay_polygon.overlay_polygon_id
ORDER BY overlay_polygon.overlay_polygon_id
;
Its EXPLAIN (ANALYZE, BUFFERS)
is below.
Sort (cost=557.78..557.88 rows=38 width=8057) (actual time=18904.661..18904.748 rows=38 loops=1)
Sort Key: overlay_polygon.overlay_polygon_id
Sort Method: quicksort Memory: 126kB
Buffers: shared hit=58690 read=17134
I/O Timings: read=9924.328
-> Hash Join (cost=544.88..556.78 rows=38 width=8057) (actual time=18903.697..18904.210 rows=38 loops=1)
Hash Cond: (overlay_polygon.overlay_polygon_id = op.overlay_polygon_id)
Buffers: shared hit=58690 read=17134
I/O Timings: read=9924.328
-> Seq Scan on overlay_polygon (cost=0.00..11.38 rows=38 width=8045) (actual time=0.127..0.411 rows=38 loops=1)
Buffers: shared hit=2 read=9
I/O Timings: read=0.173
-> Hash (cost=544.41..544.41 rows=38 width=12) (actual time=18903.500..18903.500 rows=38 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 2kB
Buffers: shared hit=58688 read=17125
I/O Timings: read=9924.154
-> HashAggregate (cost=543.65..544.03 rows=38 width=8) (actual time=18903.276..18903.379 rows=38 loops=1)
Buffers: shared hit=58688 read=17125
I/O Timings: read=9924.154
-> Nested Loop Left Join (cost=0.00..543.46 rows=38 width=8) (actual time=0.052..17169.606 rows=415224 loops=1)
Buffers: shared hit=58688 read=17125
I/O Timings: read=9924.154
-> Seq Scan on overlay_polygon op (cost=0.00..11.38 rows=38 width=8038) (actual time=0.004..0.537 rows=38 loops=1)
Buffers: shared hit=1 read=10
I/O Timings: read=0.279
-> Index Scan using spix_point on point (cost=0.00..13.99 rows=1 width=200) (actual time=4.422..381.991 rows=10927 loops=38)
Index Cond: (op.gopm && gopm)
Filter: ((op.some_other_polygon_id = some_other_polygon_id) AND _st_intersects(op.geom, geom))
Rows Removed by Filter: 13353
Buffers: shared hit=58687 read=17115
I/O Timings: read=9923.875
Total runtime: 18905.293 ms
As you can see, they have comparable cost estimates, although I'm not sure how accurate those cost estimates are. I am dubious about cost estimates that involve PostGIS functions. Both tables have had VACUUM ANALYZE FULL
run on them since they were last modified and before running the queries.
Perhaps I am simply unable to read my EXPLAIN ANALYZE
s, but I cannot see why these queries have such drastically different run times. Can anyone identify anything? The only possibility I can think of is related to the number of columns involved in the LEFT JOIN
.
Per the suggestion of @ChrisTravers, I increases work_mem
and reran the first query. I don't believe this represents a significant improvement.
Executed
SET work_mem='4MB';
(It was 1 MB.)
Then executing the first query gave these results.
GroupAggregate (cost=544.45..545.12 rows=38 width=8049) (actual time=339910.046..495775.478 rows=38 loops=1)
Buffers: shared hit=58552 read=17261, temp read=112133 written=112133
-> Sort (cost=544.45..544.55 rows=38 width=8049) (actual time=325391.923..491329.208 rows=415224 loops=1)
Sort Key: op.overlay_polygon_id
Sort Method: external merge Disk: 896904kB
Buffers: shared hit=58552 read=17261, temp read=112133 written=112133
-> Nested Loop Left Join (cost=0.00..543.46 rows=38 width=8049) (actual time=14.698..234266.573 rows=415224 loops=1)
Buffers: shared hit=58552 read=17261
-> Seq Scan on overlay_polygon op (cost=0.00..11.38 rows=38 width=8045) (actual time=14.612..15.384 rows=38 loops=1)
Buffers: shared read=11
-> Index Scan using spix_point on point (cost=0.00..13.99 rows=1 width=200) (actual time=95.262..5451.636 rows=10927 loops=38)
Index Cond: (op.geom && geom)
Filter: ((op.some_other_polygon_id = some_other_polygon_id) AND _st_intersects(op.geom, geom))
Rows Removed by Filter: 13353
Buffers: shared hit=58552 read=17250
Total runtime: 496936.775 ms
Well, here's a nice, big smell that I didn't notice before (largely because I have trouble reading the ANALYZE
output). Sorry I didn't notice it sooner.
Sort (cost=544.45..544.55 rows=38 width=8049) (actual time=271754.952..534154.573 rows=415224 loops=1)
Estimated rows: 38. Actual rows: over 400K. Ideas, anyone?
My immediate thought is that this may have to do with work_mem limits. The difference between the plans is that in the first, you join and then aggregate, and in the second you aggregate and the join. This means your aggregation set narrower, which means less memory is used over that operation.
It would be interesting to see what would change if you tried doubling work_mem and trying again.
Edit: Now that we know increasing the work_mem leads to only a modest improvement, the next issue is the sort row estimate. I suspect it is in fact exceeding work_mem here and it is expecting this will be easy because it expects only 38 rows out, but instead gets a lot of rows out. It isn't clear to me where the planner is getting this information because it's pretty clear that the planner estimates (correctly) that 38 rows is the number of rows we expect out of the aggregate. This part is starting to look like a planner bug to me, but I am having a hard time putting my finger on it. It might be worth writing up and bringing up on the pgsql-general email list. It almost looks to me like the planner is getting confused between memory required for the sort and memory required for the aggregate.