Search code examples
postgresqlquery-optimizationpostgisopenstreetmapspatial-query

Postgis query over 2 tables explodes in runtime


Two small postgis queries alone work fine:

SELECT a.name, a.population, a.place 
   FROM SH_POINT a 
   WHERE a.place IN ('hamlet', 'city', 'town', 'village');

Returns 3700 rows in only 0.26s.

SELECT b.name,b.population 
   FROM SH_POLY b 
   WHERE b.boundary='administrative' AND b.admin_level='8';

Return 1161 rows in only 0.35s.

Now I want to combine these results simply based on the name. With such small datasets it should be no problem but

SELECT a.name, a.population, a.place, b.population 
   FROM SH_POINT a, SH_POLY b 
   WHERE a.place in ('hamlet', 'city', 'town', 'village') 
      AND b.boundary='administrative' AND b.admin_level='8' 
      AND a.name = b.name;

takes over 16 minutes for only 1273 rows!

Other variants like

WITH 
   a AS (SELECT name, population, place 
            FROM SH_POINT 
            WHERE place IN ('hamlet', 'city', 'town', 'village') ), 
   b AS (SELECT name, population 
            FROM SH_POLY 
            WHERE boundary='administrative' AND admin_level='8' ) 

SELECT a.name, a.population, a.place, b.population 
   FROM a, b 
   WHERE a.name = b.name;

fail, too: Time: 990295,242 ms (16:30,295)

SH_POINT and SH_POLY are views to a normal planet_osm_* postgis table to reduce the interesting data to a specific area. (As you can see in the first two single queries this works fine.)

CREATE VIEW SH_POINT AS SELECT * FROM planet_osm_point WHERE 
   way @ (SELECT ST_Collect(way) FROM planet_osm_polygon 
             WHERE name='Schleswig-Holstein' AND admin_level='4') 
   AND ST_Within(way, (SELECT ST_COLLECT(way) FROM planet_osm_polygon WHERE 
                          name='Schleswig-Holstein' AND admin_level='4') );

A query analysis for another variant:

EXPLAIN (ANALYZE, BUFFERS) SELECT a.name, a.population, a.place, b.population from SH_POINT a FULL OUTER JOIN SH_POLY b ON a.name = b.name WHERE a.place IN ('hamlet', 'city', 'town', 'village') AND b.boundary='administrative' AND b.admin_level='8';
Time: 1008286,628 ms (16:48,287)

                                                                          QUERY PLAN                                                                      
       
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
 Nested Loop  (cost=242.04..304.10 rows=1 width=34) (actual time=524.970..1008283.783 rows=1273 loops=1)
   Buffers: shared hit=811729 read=193326311
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=57.88..57.89 rows=1 width=32) (actual time=0.289..0.291 rows=1 loops=1)
           Buffers: shared hit=3 read=22
           ->  Bitmap Heap Scan on planet_osm_polygon planet_osm_polygon_1  (cost=52.49..56.51 rows=1 width=189) (actual time=0.084..0.089 rows=2 loops=1)
                 Recheck Cond: ((name = 'Schleswig-Holstein'::text) AND (admin_level = '4'::text))
                 Heap Blocks: exact=2
                 Buffers: shared hit=3 read=6
                 ->  BitmapAnd  (cost=52.49..52.49 rows=1 width=0) (actual time=0.073..0.075 rows=0 loops=1)
                       Buffers: shared hit=2 read=5
                       ->  Bitmap Index Scan on planet_osm_polygon_name_idx  (cost=0.00..5.07 rows=67 width=0) (actual time=0.047..0.047 rows=2 loops=1)
                             Index Cond: (name = 'Schleswig-Holstein'::text)
                             Buffers: shared hit=2 read=2
                       ->  Bitmap Index Scan on planet_osm_polygon_adminlevel_idx  (cost=0.00..47.17 rows=4098 width=0) (actual time=0.023..0.023 rows=106 lo
ops=1)
                             Index Cond: (admin_level = '4'::text)
                             Buffers: shared read=3
   InitPlan 2 (returns $1)
     ->  Aggregate  (cost=57.88..57.89 rows=1 width=32) (actual time=0.119..0.121 rows=1 loops=1)
           Buffers: shared hit=25
           ->  Bitmap Heap Scan on planet_osm_polygon planet_osm_polygon_2  (cost=52.49..56.51 rows=1 width=189) (actual time=0.030..0.033 rows=2 loops=1)
                 Recheck Cond: ((name = 'Schleswig-Holstein'::text) AND (admin_level = '4'::text))
                 Heap Blocks: exact=2
                 Buffers: shared hit=9
                 ->  BitmapAnd  (cost=52.49..52.49 rows=1 width=0) (actual time=0.027..0.028 rows=0 loops=1)
                       Buffers: shared hit=7
                       ->  Bitmap Index Scan on planet_osm_polygon_name_idx  (cost=0.00..5.07 rows=67 width=0) (actual time=0.015..0.015 rows=2 loops=1)
                             Index Cond: (name = 'Schleswig-Holstein'::text)
                             Buffers: shared hit=4
                       ->  Bitmap Index Scan on planet_osm_polygon_adminlevel_idx  (cost=0.00..47.17 rows=4098 width=0) (actual time=0.010..0.011 rows=106 lo
ops=1)
                             Index Cond: (admin_level = '4'::text)
                             Buffers: shared hit=3
   InitPlan 3 (returns $2)
     ->  Aggregate  (cost=57.88..57.89 rows=1 width=32) (actual time=0.122..0.124 rows=1 loops=1)
           Buffers: shared hit=25
           ->  Bitmap Heap Scan on planet_osm_polygon planet_osm_polygon_3  (cost=52.49..56.51 rows=1 width=189) (actual time=0.032..0.035 rows=2 loops=1)
                 Recheck Cond: ((name = 'Schleswig-Holstein'::text) AND (admin_level = '4'::text))
                 Heap Blocks: exact=2
                 Buffers: shared hit=9
                 ->  BitmapAnd  (cost=52.49..52.49 rows=1 width=0) (actual time=0.029..0.030 rows=0 loops=1)
                       Buffers: shared hit=7
                       ->  Bitmap Index Scan on planet_osm_polygon_name_idx  (cost=0.00..5.07 rows=67 width=0) (actual time=0.017..0.017 rows=2 loops=1)
                             Index Cond: (name = 'Schleswig-Holstein'::text)
                             Buffers: shared hit=4
                       ->  Bitmap Index Scan on planet_osm_polygon_adminlevel_idx  (cost=0.00..47.17 rows=4098 width=0) (actual time=0.010..0.010 rows=106 loops=1)
                             Index Cond: (admin_level = '4'::text)
                             Buffers: shared hit=3
   InitPlan 4 (returns $3)
     ->  Aggregate  (cost=57.88..57.89 rows=1 width=32) (actual time=0.192..0.193 rows=1 loops=1)
           Buffers: shared hit=25
           ->  Bitmap Heap Scan on planet_osm_polygon planet_osm_polygon_4  (cost=52.49..56.51 rows=1 width=189) (actual time=0.026..0.028 rows=2 loops=1)
                 Recheck Cond: ((name = 'Schleswig-Holstein'::text) AND (admin_level = '4'::text))
                 Heap Blocks: exact=2

                 
                 Buffers: shared hit=9
                 ->  BitmapAnd  (cost=52.49..52.49 rows=1 width=0) (actual time=0.023..0.024 rows=0 loops=1)
                       Buffers: shared hit=7
                       ->  Bitmap Index Scan on planet_osm_polygon_name_idx  (cost=0.00..5.07 rows=67 width=0) (actual time=0.013..0.013 rows=2 loops=1)
                             Index Cond: (name = 'Schleswig-Holstein'::text)
                             Buffers: shared hit=4
                       ->  Bitmap Index Scan on planet_osm_polygon_adminlevel_idx  (cost=0.00..47.17 rows=4098 width=0) (actual time=0.009..0.009 rows=106 loops=1)
                             Index Cond: (admin_level = '4'::text)
                             Buffers: shared hit=3
   ->  Index Scan using planet_osm_point_place on planet_osm_point  (cost=0.28..33.31 rows=1 width=29) (actual time=3.592..85.382 rows=3699 loops=1)
         Index Cond: ((way @ $0) AND (way @ $1))
         Filter: ((place = ANY ('{hamlet,city,town,village}'::text[])) AND st_within(way, $1))
         Rows Removed by Filter: 4951
         Buffers: shared hit=49 read=4234
   ->  Bitmap Heap Scan on planet_osm_polygon  (cost=10.19..39.21 rows=1 width=24) (actual time=272.552..272.555 rows=0 loops=3699)
         Recheck Cond: ((way @ $2) AND (name = planet_osm_point.name))
         Rows Removed by Index Recheck: 5
         Filter: ((boundary = 'administrative'::text) AND (admin_level = '8'::text) AND st_within(way, $3))
         Rows Removed by Filter: 0
         Heap Blocks: exact=1369 lossy=912
         Buffers: shared hit=811680 read=193322077
         ->  BitmapAnd  (cost=10.19..10.19 rows=1 width=0) (actual time=272.290..272.290 rows=0 loops=3699)
               Buffers: shared hit=811480 read=193319438
               ->  Bitmap Index Scan on planet_osm_polygon_way_idx  (cost=0.00..4.87 rows=45 width=0) (actual time=271.210..271.210 rows=2789969 loops=3699)
                     Index Cond: ((way @ $2) AND (way @ $3))
                     Buffers: shared hit=811430 read=193304692
               ->  Bitmap Index Scan on planet_osm_polygon_name_idx  (cost=0.00..5.07 rows=67 width=0) (actual time=0.018..0.018 rows=2 loops=3699)
                     Index Cond: (name = planet_osm_point.name)
                     Buffers: shared hit=50 read=14746
 Planning:
   Buffers: shared hit=10
 Planning Time: 1.283 ms
 Execution Time: 1008284.465 ms
                 

The only workaround I have found so far is to store both queries in new tables:

CREATE TABLE ppoint AS SELECT a.name, a.population, a.place 
   FROM SH_POINT a 
   WHERE a.place IN ('hamlet', 'city', 'town', 'village') ;

CREATE TABLE ppoly AS SELECT b.name, b.population 
   FROM SH_POLY b 
   WHERE b.boundary='administrative' AND b.admin_level='8';

SELECT a.name, a.population, a.place, b.population 
   FROM ppoint a, ppoly b 
   WHERE a.name=b.name;

This works in less than 3ms but is not a desired option here.

  • What is the root cause for such a long runtime?
  • Can the query be modified to give results in acceptable times?

Thanks for your help!


Solution

  • The planner has rewritten the query embedding the view code in an inefficient way.

    Using the CTE, you can force it to first compute (materialize) the selection from the views:

    WITH 
       a AS MATERIALIZED  (SELECT name, population, place 
                FROM SH_POINT 
                WHERE place IN ('hamlet', 'city', 'town', 'village') ), 
       b AS MATERIALIZED  (SELECT name, population 
                FROM SH_POLY 
                WHERE boundary='administrative' AND admin_level='8' ) 
    
    SELECT a.name, a.population, a.place, b.population 
       FROM a, b 
       WHERE a.name = b.name;