Search code examples
sqlpostgresqlrecursiongispostgis

How to select all groups of intersecting polygons from a single table using PostGIS


Given a table of groups of intersecting polygons as shown.enter image description here(https://i.sstatic.net/hlXs1.png) How can we select all the groups of polygons that intersect?

The following query, using recursion, gets some of the way there but it duplicates all the groups and groups with multiple overlaps duplicate polygons within the group.

WITH RECURSIVE cte AS (

     SELECT id AS root_id,
            id, ARRAY[id] AS path,
            geom
       FROM polygons

      UNION ALL
     SELECT cte.root_id,
            t.id, path || t.id,
            t.geom
       FROM polygons AS t,
            cte
      WHERE t.id <> ALL(cte.path)
        AND ST_Intersects(t.geom, cte.geom) 
)

SELECT root_id, ARRAY_AG(id)
  FROM cte
 GROUP BY root_id
 ORDER BY root_id

The resulting selection looks like this:

root_id array_agg
1 1,2
2 2,1
3 3,4,5,6
4 4,3,5,6
5 5,4,6,3
6 6,5,4,3
7 7,8,9,9,8
8 8,7,9,9,7
9 9,7,8,8,7
10 10

As you will notice, selections with root_id 1 and 2 contain the same polygons and root_id 7, 8 and 9 all duplicate polygons within the group.


Solution

  • There's a function for exactly that, named exactly like that: ST_ClusterIntersectingWin(). Added in PostGIS 3.4.0.
    Demo at db<>fiddle:

    enter image description here

    explain analyze verbose 
    create table my_shapes_clustered2 as
    select 1+ST_ClusterIntersectingWin(geom)over() as cluster_number, *
    from my_shapes;
    
    QUERY PLAN
    WindowAgg (cost=0.00..15091.00 rows=1200 width=391) (actual time=5.639..5.948 rows=1200 loops=1)
      Output: (1 + st_clusterintersectingwin(geom) OVER (?)), id, initial_seed, geom
      -> Seq Scan on public.my_shapes (cost=0.00..76.00 rows=1200 width=387) (actual time=0.006..0.215 rows=1200 loops=1)
            Output: geom, id, initial_seed
    Planning Time: 0.063 ms
    Execution Time: 8.455 ms

    Inspecting some samples:

    select cluster_number,st_area(st_union(geom)),count(*),array_agg(id)
    from my_shapes_clustered2 
    group by cluster_number 
    order by cluster_number 
    limit 8;
    
    cluster_number st_area count array_agg
    1 836.248837776196 2 {798,1}
    2 651.4061064388445 1 {2}
    3 53.56050595625381 1 {3}
    4 269.4408305746047 1 {4}
    5 1596.5821681225993 4 {507,5,676,72}
    6 1262.1100035153077 3 {186,296,6}
    7 2008.7632476757103 4 {474,891,7,1121}
    8 66.289777695502 1 {8}

    In PostGIS 2.3.0+ you can get the same result, although slightly slower, using the ST_ClusterDBSCAN(geom,0,1) function suggested by @JGH:

    explain analyze verbose 
    create table my_shapes_clustered3 as
    select 1+ST_ClusterDBSCAN(geom,0,1)over() as cluster_number, *
    from my_shapes;
    
    QUERY PLAN
    WindowAgg (cost=0.00..15091.00 rows=1200 width=391) (actual time=17.688..18.018 rows=1200 loops=1)
      Output: (1 + st_clusterdbscan(geom, '0'::double precision, 1) OVER (?)), id, initial_seed, geom
      -> Seq Scan on public.my_shapes (cost=0.00..76.00 rows=1200 width=387) (actual time=0.005..0.218 rows=1200 loops=1)
            Output: geom, id, initial_seed
    Planning Time: 0.041 ms
    Execution Time: 21.007 ms

    In 2.2.0 there's a bit more clunky ST_ClusterIntersecting() that's spitting out collections.