Given a table of groups of intersecting polygons as shown.(https://i.sstatic.net/hlXs1.png) How can we select all the groups of polygons that intersect?
The following query, using recursion, gets some of the way there but it duplicates all the groups and groups with multiple overlaps duplicate polygons within the group.
WITH RECURSIVE cte AS (
SELECT id AS root_id,
id, ARRAY[id] AS path,
geom
FROM polygons
UNION ALL
SELECT cte.root_id,
t.id, path || t.id,
t.geom
FROM polygons AS t,
cte
WHERE t.id <> ALL(cte.path)
AND ST_Intersects(t.geom, cte.geom)
)
SELECT root_id, ARRAY_AG(id)
FROM cte
GROUP BY root_id
ORDER BY root_id
The resulting selection looks like this:
root_id | array_agg |
---|---|
1 | 1,2 |
2 | 2,1 |
3 | 3,4,5,6 |
4 | 4,3,5,6 |
5 | 5,4,6,3 |
6 | 6,5,4,3 |
7 | 7,8,9,9,8 |
8 | 8,7,9,9,7 |
9 | 9,7,8,8,7 |
10 | 10 |
As you will notice, selections with root_id
1 and 2 contain the same polygons and root_id
7, 8 and 9 all duplicate polygons within the group.
There's a function for exactly that, named exactly like that: ST_ClusterIntersectingWin()
. Added in PostGIS 3.4.0.
Demo at db<>fiddle:
explain analyze verbose
create table my_shapes_clustered2 as
select 1+ST_ClusterIntersectingWin(geom)over() as cluster_number, *
from my_shapes;
QUERY PLAN |
---|
WindowAgg (cost=0.00..15091.00 rows=1200 width=391) (actual time=5.639..5.948 rows=1200 loops=1) |
Output: (1 + st_clusterintersectingwin(geom) OVER (?)), id, initial_seed, geom |
-> Seq Scan on public.my_shapes (cost=0.00..76.00 rows=1200 width=387) (actual time=0.006..0.215 rows=1200 loops=1) |
Output: geom, id, initial_seed |
Planning Time: 0.063 ms |
Execution Time: 8.455 ms |
Inspecting some samples:
select cluster_number,st_area(st_union(geom)),count(*),array_agg(id)
from my_shapes_clustered2
group by cluster_number
order by cluster_number
limit 8;
cluster_number | st_area | count | array_agg |
---|---|---|---|
1 | 836.248837776196 | 2 | {798,1} |
2 | 651.4061064388445 | 1 | {2} |
3 | 53.56050595625381 | 1 | {3} |
4 | 269.4408305746047 | 1 | {4} |
5 | 1596.5821681225993 | 4 | {507,5,676,72} |
6 | 1262.1100035153077 | 3 | {186,296,6} |
7 | 2008.7632476757103 | 4 | {474,891,7,1121} |
8 | 66.289777695502 | 1 | {8} |
In PostGIS 2.3.0+ you can get the same result, although slightly slower, using the ST_ClusterDBSCAN(geom,0,1)
function suggested by @JGH:
explain analyze verbose
create table my_shapes_clustered3 as
select 1+ST_ClusterDBSCAN(geom,0,1)over() as cluster_number, *
from my_shapes;
QUERY PLAN |
---|
WindowAgg (cost=0.00..15091.00 rows=1200 width=391) (actual time=17.688..18.018 rows=1200 loops=1) |
Output: (1 + st_clusterdbscan(geom, '0'::double precision, 1) OVER (?)), id, initial_seed, geom |
-> Seq Scan on public.my_shapes (cost=0.00..76.00 rows=1200 width=387) (actual time=0.005..0.218 rows=1200 loops=1) |
Output: geom, id, initial_seed |
Planning Time: 0.041 ms |
Execution Time: 21.007 ms |
In 2.2.0 there's a bit more clunky ST_ClusterIntersecting()
that's spitting out collections.