I have a Postgres database with a postgis extention installed and filles with open street map data.
With the following SQL statement :
SELECT
l.osm_id,
sum(
st_area(st_intersection(ST_Buffer(l.way, 30), p.way))
/
st_area(ST_Buffer(l.way, 30))
) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, ST_Buffer(p.way,30))
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id;
I calculate a "green" score.
My goal is to create a "green" score for each osm_id.
Which means; how much of a road is near a water, forrest or something similar.
To do so:
I Create a 30 meter buffer around each way, and calculate the intersection between that buffered way and any green features nearby.
I am using 'green features' to refer to polygons in OpenStreetMap's database, such as a park.
Is it possbile to accelerate this calculation?
One thing I id is to create 2 indices in a hope to accelerate the calculation:
CREATE INDEX way_index_2 on planet_osm_polygon USING gist(way) WHERE "natural" IN ('water','wood','forest','hill','valley');
CREATE INDEX way_index_3 on planet_osm_polygon USING gist(way) WHERE "landuse" IN ('forest');
Here is an "explain" to this statement:
EXPLAIN (ANALYZE, BUFFERS) SELECT
l.osm_id,
sum(
st_area(st_intersection(ST_Buffer(l.way, 30), p.way))
/
st_area(ST_Buffer(l.way, 30))
) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, ST_Buffer(p.way,30))
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=17816.83..133789235.22 rows=1 width=16) (actual time=1575643.737..1575651.862 rows=1 loops=1)
Buffers: shared hit=816315 read=221878, temp read=313661 written=73938
-> GroupAggregate (cost=17816.83..1435958589062981.00 rows=10734420 width=16) (actual time=1575643.723..1575651.847 rows=1 loops=1)
Group Key: l.osm_id
Buffers: shared hit=816315 read=221878, temp read=313661 written=73938
-> Nested Loop (cost=17816.83..1433802261939271.50 rows=28652652777 width=448) (actual time=978502.788..1575648.857 rows=8 loops=1)
Join Filter: st_intersects(l.way, st_buffer(p.way, '30'::double precision, ''::text))
Rows Removed by Join Filter: 6528525
Buffers: shared hit=816315 read=221878, temp read=313661 written=73938
-> Index Scan using osm_id_idx on planet_osm_line l (cost=0.44..1242021.57 rows=22671610 width=247) (actual time=5.963..6.181 rows=6 loops=1)
Buffers: shared hit=5 read=3
-> Materialize (cost=17816.39..1445364.98 rows=1263812 width=201) (actual time=85.181..4605.942 rows=1088089 loops=6)
Buffers: shared hit=380236 read=220348, temp read=313661 written=73938
-> Gather (cost=17816.39..1403253.92 rows=1263812 width=201) (actual time=510.609..1066.182 rows=1250378 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=380236 read=220348
-> Parallel Bitmap Heap Scan on planet_osm_polygon p (cost=16816.39..1275872.72 rows=315953 width=201) (actual time=447.168..9410.838 rows=250076 loops=5)
Recheck Cond: (("natural" = ANY ('{water,wood,forest,hill,valley}'::text[])) OR (landuse = 'forest'::text))
Rows Removed by Index Recheck: 2554266
Filter: (("natural" = 'water'::text) OR (landuse = 'forest'::text))
Rows Removed by Filter: 53217
Heap Blocks: lossy=1
Buffers: shared hit=380236 read=220348
-> BitmapOr (cost=16816.39..16816.39 rows=1554297 width=0) (actual time=491.891..491.893 rows=0 loops=1)
Buffers: shared hit=7797
-> Bitmap Index Scan on way_index_2 (cost=0.00..7822.79 rows=750359 width=0) (actual time=413.690..413.690 rows=737741 loops=1)
Buffers: shared hit=3758
-> Bitmap Index Scan on way_index_3 (cost=0.00..8361.69 rows=803938 width=0) (actual time=78.198..78.198 rows=783702 loops=1)
Buffers: shared hit=4039
Planning Time: 0.315 ms
Execution Time: 1575673.609 ms
(32 rows)
UPDATE:
Here is a short schema of the both tables;
Table
"public.planet_osm_line"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------------------+---------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
osm_id | bigint | | | | plain | | |
access | text | | | | extended | | |
addr:housename | text | | | | extended | | |
addr:housenumber | text | | | | extended | | |
addr:interpolation | text | | | | extended | | |
admin_level | text | | | | extended | | |
....
natural | text | | | | extended | | |
Indexes:
"highway_idx" btree (highway)
"motorway_idx" gist (way) WHERE highway = 'motorway'::text
"motorway_trunk_primary_secondary_tertiary_unclassified_idx" gist (way) WHERE highway = ANY (ARRAY['motorway'::text, 'trunk'::text, 'primary'::text, 'secondary'::text, 'tertiary'::text, 'unclassified'::text])
"name_idx" btree (name)
"osm_id_idx" btree (osm_id)
"planet_osm_line_osm_id_idx" btree (osm_id)
"planet_osm_line_way_idx" gist (way)
"primary_idx" gist (way) WHERE highway = 'primary'::text
"primary_secondary_idx" gist (way) WHERE highway = ANY (ARRAY['primary'::text, 'secondary'::text])
"primary_secondary_tertiary_idx" gist (way) WHERE highway = ANY (ARRAY['primary'::text, 'secondary'::text, 'tertiary'::text])
"primary_secondary_tertiary_unclassified_idx" gist (way) WHERE highway = ANY (ARRAY['primary'::text, 'secondary'::text, 'tertiary'::text, 'unclassified'::text])
"secondary_idx" gist (way) WHERE highway = 'secondary'::text
"secondary_tertiary_idx" gist (way) WHERE highway = ANY (ARRAY['secondary'::text, 'tertiary'::text])
"tertiary_idx" gist (way) WHERE highway = 'tertiary'::text
"tertiary_secondary_idx" gist (way) WHERE highway = ANY (ARRAY['tertiary'::text, 'unclassified'::text])
"trunk_idx" gist (way) WHERE highway = 'trunk'::text
"trunk_primary_secondary_tertiary_unclassified_idx" gist (way) WHERE highway = ANY (ARRAY['trunk'::text, 'primary'::text, 'secondary'::text, 'tertiary'::text, 'unclassified'::text])
"unclassified_idx" gist (way) WHERE highway = 'unclassified'::text
"way_idx" gist (way)
"way_index_1" gist (way)
"way_index_4" gist (way) WHERE "natural" = ANY (ARRAY['water'::text, 'wood'::text, 'forest'::text, 'hill'::text, 'valley'::text])
"way_index_5" gist (way) WHERE landuse = 'forest'::text
Triggers:
planet_osm_line_osm2pgsql_valid BEFORE INSERT OR UPDATE ON planet_osm_line FOR EACH ROW EXECUTE FUNCTION planet_osm_line_osm2pgsql_valid()
Access method: heap
And:
Table "public.planet_osm_polygon"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------------------+-------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
osm_id | bigint | | | | plain | | |
access | text | | | | extended | | |
addr:housename | text | | | | extended | | |
addr:housenumber | text | | | | extended | | |
addr:interpolation | text | | | | extended | | |
admin_level | text | | | | extended | | |
aerialway | text | | | | extended | | |
aeroway | text | | | | extended | | |
amenity | text | | | | extended | | |
area | text | | | | extended | | |
barrier | text | | | | extended | | |
landuse | text | | | | extended | | |
Indexes:
"fuel_toilet_parking_restaurant_idex" gist (way) WHERE amenity = ANY (ARRAY['fuel'::text, 'toilets'::text, 'parking'::text, 'restaurant'::text, 'cafe'::text, 'pub'::text, 'ice_cream'::text, 'biergarten'::text])
"planet_osm_polygon_osm_id_idx" btree (osm_id)
"planet_osm_polygon_way_idx" gist (way)
"viewpoint_attraction_guest_house_idex" gist (way) WHERE tourism = ANY (ARRAY['viewpoint '::text, 'attraction'::text, 'guest_house'::text])
"way_index_2" gist (way) WHERE "natural" = ANY (ARRAY['water'::text, 'wood'::text, 'forest'::text, 'hill'::text, 'valley'::text])
"way_index_3" gist (way) WHERE landuse = 'forest'::text
Triggers:
planet_osm_polygon_osm2pgsql_valid BEFORE INSERT OR UPDATE ON planet_osm_polygon FOR EACH ROW EXECUTE FUNCTION planet_osm_polygon_osm2pgsql_valid()
Access method: heap
UPDATE:
planet_osm_polygon
wood | text | | | | extended | | |
z_order | integer | | | | plain | | |
way_area | real | | | | plain | | |
way | geometry(Geometry,3857) | | | | main | | |
way_buffer_30 | geometry(Polygon) | | | | external | | |
way_buffer_30_area | numeric | | | generated always as (st_area(way_buffer_30)) stored | main | | |
AND
planet_osm_line
way_area | real | | | | plain | | |
way | geometry(LineString,3857) | | | | main | | |
Not really an answer as much as a bunch of comments too long for a comment format:
alter system set default_toast_compression=lz4;
alter table planet_osm_polygon
add column way_buffer_30 geometry,
alter column way_buffer_30 set storage external,
add column way_buffer_30_area numeric
generated always as (st_area(way_buffer_30)) stored;
update planet_osm_polygon
set way_buffer_30=st_buffer(way,30,'quad_segs=1');
create index osmp_way_buffer_30_gix on planet_osm_polygon
using gist(way_buffer_30) with (fillfactor=100);
cluster verbose planet_osm_polygon using osmp_way_buffer_30_gix;
alter table planet_osm_line
add column way_buffer_30 geometry,
alter column way_buffer_30 set storage external,
add column way_buffer_30_area numeric
generated always as (st_area(way_buffer_30)) stored;
update planet_osm_line
set way_buffer_30=st_buffer(way,30,'quad_segs=1');
create index osml_way_buffer_30_gix on planet_osm_line
using gist(way_buffer_30) with (fillfactor=100);
cluster verbose planet_osm_polygon using osml_way_buffer_30_gix;
SELECT
l.osm_id,
sum(st_area(st_intersection(l.way_buffer_30, p.way))
/ l.way_buffer_30_area
) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, p.way_buffer_30)
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id;
default_toast_compression=lz4
You're working with polygons, which are likely to be compressed and TOASTed. Default default_toast_compression=pglz
is typically slower than lz4
. Note that you need to force a re-write of those tables/columns after altering the setting, otherwise it'll apply as a default only from that point onwards, not affecting anything retroactively.storage external
skips one step in retrieving the data if the shapes are big/complex enough. Might be worth it depending on your PostGIS version.st_buffer(way,30,'quad_segs=1')
you can tweak the third parameter to get a simpler shape, that's easier to compare against. Default quad_segs=8
can cause your buffer to have 8x more vertices than the input.with (fillfactor=100)
I assume it's a static source table that you replace when a new version is published, rather than maintain the current one yourself. Therefore, indexes can be made static, too (default fillfactor=90
is meant to account for new rows inserted into the table).cluster
aligns table pages with the index, making heap fetches faster.ST_Buffer()
. Same goes for ST_Transform()
- apply it on the column before indexing, not in the query.ST_DWithin()
speeds up the join compared to your current code and the one suggested above. Be sure to test with a suitable index. Keep in mind the unit of distance depends on the column SRID - if you're using metric you'll get anomalies from systems in imperial/nautical/degrees.
INNER JOIN planet_osm_polygon AS p ON ST_DWithin(l.way,p.way,30)
create index way_buffer_30_gix on planet_osm_polygon
using gist(way_buffer_30) with (fillfactor=100)
where natural='water' or landuse='forest';
where
clause.pgbench
to run your benchmarks.