Search code examples
sqlpostgresqlpostgisspatialpostgresql-performance

Spatial query on large table with multiple self joins performing slow


I am working on queries on a large table in Postgres 9.3.9. It is a spatial dataset and it is spatially indexed. Say, I have need to find 3 types of objects: A, B and C. The criteria is that B and C are both within certain distance of A, say 500 meters.

My query is like this:

select 
  school.osm_id as school_osm_id, 
  school.name as school_name, 
  school.way as school_way, 
  restaurant.osm_id as restaurant_osm_id, 
  restaurant.name as restaurant_name, 
  restaurant.way as restaurant_way, 
  bar.osm_id as bar_osm_id, 
  bar.name as bar_name, 
  bar.way as bar_way 
from (
    select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'school') as school, 
   (select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'restaurant') as restaurant, 
   (select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'bar') as bar 
where ST_DWithin(school.way_geo, restaurant.way_geo, 500, false) 
  and ST_DWithin(school.way_geo, bar.way_geo, 500, false);

This query gives me what I want, but it takes really long time, like 13 seconds to execute. I'm wondering if there is another way to write the query and make it more efficient.

Query plan:

Nested Loop  (cost=74.43..28618.65 rows=1 width=177) (actual time=33.513..11235.212 rows=10591 loops=1)
   Buffers: shared hit=530967 read=8733
   ->  Nested Loop  (cost=46.52..28586.46 rows=1 width=174) (actual time=31.998..9595.212 rows=4235 loops=1)
         Buffers: shared hit=389863 read=8707
         ->  Bitmap Heap Scan on planet_osm_point  (cost=18.61..2897.83 rows=798 width=115) (actual time=7.862..150.607 rows=8811 loops=1)
               Recheck Cond: (amenity = 'school'::text)
               Buffers: shared hit=859 read=5204
               ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=5.416..5.416 rows=8811 loops=1)
                     Index Cond: (amenity = 'school'::text)
                     Buffers: shared hit=3 read=24
         ->  Bitmap Heap Scan on planet_osm_point planet_osm_point_1  (cost=27.91..32.18 rows=1 width=115) (actual time=1.064..1.069 rows=0 loops=8811)
               Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'restaurant'::text))
               Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
               Rows Removed by Filter: 0
               Buffers: shared hit=389004 read=3503
               ->  BitmapAnd  (cost=27.91..27.91 rows=1 width=0) (actual time=1.058..1.058 rows=0 loops=8811)
                     Buffers: shared hit=384528 read=2841
                     ->  Bitmap Index Scan on idx_planet_osm_point_waygeo  (cost=0.00..9.05 rows=137 width=0) (actual time=0.193..0.193 rows=64 loops=8811)
                           Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
                           Buffers: shared hit=146631 read=2841
                     ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=0.843..0.843 rows=6291 loops=8811)
                           Index Cond: (amenity = 'restaurant'::text)
                           Buffers: shared hit=237897
   ->  Bitmap Heap Scan on planet_osm_point planet_osm_point_2  (cost=27.91..32.18 rows=1 width=115) (actual time=0.375..0.383 rows=3 loops=4235)
         Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'bar'::text))
         Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
         Rows Removed by Filter: 1
         Buffers: shared hit=141104 read=26
         ->  BitmapAnd  (cost=27.91..27.91 rows=1 width=0) (actual time=0.368..0.368 rows=0 loops=4235)
               Buffers: shared hit=127019
               ->  Bitmap Index Scan on idx_planet_osm_point_waygeo  (cost=0.00..9.05 rows=137 width=0) (actual time=0.252..0.252 rows=363 loops=4235)
                     Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
                     Buffers: shared hit=101609
               ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=0.104..0.104 rows=779 loops=4235)
                     Index Cond: (amenity = 'bar'::text)
                     Buffers: shared hit=25410
 Total runtime: 11238.605 ms

I'm only using one table at the moment with 1,372,711 rows. It has 73 columns:

       Column       |         Type         |       Modifiers
--------------------+----------------------+---------------------------
 osm_id             | bigint               | 
 access             | text                 | 
 addr:housename     | text                 | 
 addr:housenumber   | text                 | 
 addr:interpolation | text                 | 
 admin_level        | text                 | 
 aerialway          | text                 | 
 aeroway            | text                 | 
 amenity            | text                 | 
 area               | text                 | 
 barrier            | text                 | 
 bicycle            | text                 | 
 brand              | text                 | 
 bridge             | text                 | 
 boundary           | text                 | 
 building           | text                 | 
 capital            | text                 | 
 construction       | text                 | 
 covered            | text                 | 
 culvert            | text                 | 
 cutting            | text                 | 
 denomination       | text                 | 
 disused            | text                 | 
 ele                | text                 | 
 embankment         | text                 | 
 foot               | text                 | 
 generator:source   | text                 | 
 harbour            | text                 | 
 highway            | text                 | 
 historic           | text                 | 
 horse              | text                 | 
 intermittent       | text                 | 
 junction           | text                 | 
 landuse            | text                 | 
 layer              | text                 | 
 leisure            | text                 | 
 lock               | text                 | 
 man_made           | text                 | 
 military           | text                 | 
 motorcar           | text                 | 
 name               | text                 | 
 natural            | text                 | 
 office             | text                 | 
 oneway             | text                 | 
 operator           | text                 | 
 place              | text                 | 
 poi                | text                 | 
 population         | text                 | 
 power              | text                 | 
 power_source       | text                 | 
 public_transport   | text                 | 
 railway            | text                 | 
 ref                | text                 | 
 religion           | text                 | 
 route              | text                 | 
 service            | text                 | 
 shop               | text                 | 
 sport              | text                 | 
 surface            | text                 | 
 toll               | text                 | 
 tourism            | text                 | 
 tower:type         | text                 | 
 tunnel             | text                 | 
 water              | text                 | 
 waterway           | text                 | 
 wetland            | text                 | 
 width              | text                 | 
 wood               | text                 | 
 z_order            | integer              | 
 tags               | hstore               | 
 way                | geometry(Point,4326) | 
 way_geo            | geography            | 
 gid                | integer              | not null default nextval('...
Indexes:
    "planet_osm_point_pkey1" PRIMARY KEY, btree (gid)
    "idx_planet_osm_point_amenity" btree (amenity)
    "idx_planet_osm_point_waygeo" gist (way_geo)
    "planet_osm_point_index" gist (way)
    "planet_osm_point_pkey" btree (osm_id)

There are 8811, 6291, 779 rows in amenity school, restaurant and bar respectively.


Solution

  • This query should go a long way (be much faster):

    WITH school AS (
       SELECT s.osm_id AS school_id, text 'school' AS type, s.osm_id, s.name, s.way_geo
       FROM   planet_osm_point s
            , LATERAL (
          SELECT  1 FROM planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'bar'
          LIMIT   1  -- bar exists -- most selective first if possible
          ) b
            , LATERAL (
          SELECT  1 FROM planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'restaurant'
          LIMIT   1  -- restaurant exists
          ) r
       WHERE  s.amenity = 'school'
       )
    SELECT * FROM (
       TABLE school  -- schools
    
       UNION ALL  -- bars
       SELECT s.school_id, 'bar', x.*
       FROM   school s
            , LATERAL (
          SELECT  osm_id, name, way_geo
          FROM    planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'bar'
          ) x
    
       UNION ALL  -- restaurants
       SELECT s.school_id, 'rest.', x.*
       FROM   school s
            , LATERAL (
          SELECT  osm_id, name, way_geo
          FROM    planet_osm_point
          WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
          AND     amenity = 'restaurant'
          ) x
       ) sub
    ORDER BY school_id, (type <> 'school'), type, osm_id;
    

    This is not the same as your original query, but rather what you actually want, as per discussion in comments:

    I want a list of schools that have restaurants and bars within 500 meters and I need the coordinates of each school and its corresponding restaurants and bars.

    So this query returns a list of those schools, followed by bars and restaurants nearby. Each set of rows is held together by the osm_id of the school in the column school_id.

    Now using LATERAL joins, to make use of the spatial GiST index.

    TABLE school is just shorthand for SELECT * FROM school:

    The expression (type <> 'school') orders the school in each set first, because:

    The subquery sub in the final SELECT is only needed to order by this expression. A UNION query limits an attached ORDER BY list to only columns, no expressions.

    I focus on the query you presented for the purpose of this answer - ignoring the extended requirement to filter on any of the other 70 text columns. That's really a design flaw. The search criteria should be concentrated in few columns. Or you'll have to index all 70 columns, and multicolumn indexes like I am going to propose are hardly an option. Still possible though ...

    Index

    In addition to the existing:

    "idx_planet_osm_point_waygeo" gist (way_geo)
    

    If always filtering on the same column, you could create a multicolumn index covering the few columns you are interested in, so index-only scans become possible:

    CREATE INDEX planet_osm_point_bar_idx ON planet_osm_point (amenity, name, osm_id)
    

    Postgres 9.5

    The upcoming Postgres 9.5 introduces major improvements that happen to address your case exactly:

    • Allow queries to perform accurate distance filtering of bounding-box-indexed objects (polygons, circles) using GiST indexes (Alexander Korotkov, Heikki Linnakangas)

      Previously, a common table expression was required to return a large number of rows ordered by bounding-box distance, and then filtered further with a more accurate non-bounding-box distance calculation.

    • Allow GiST indexes to perform index-only scans (Anastasia Lubennikova, Heikki Linnakangas, Andreas Karlsson)

    That's of particular interest for you. Now you can have a single multicolumn (covering) GiST index:

    CREATE INDEX reservations_range_idx ON reservations
    USING gist(amenity, way_geo, name, osm_id)
    

    And:

    • Improve bitmap index scan performance (Teodor Sigaev, Tom Lane)

    And:

    • Add GROUP BY analysis functions GROUPING SETS, CUBE and ROLLUP (Andrew Gierth, Atri Sharma)

    Why? Because ROLLUP would simplify the query I suggested. Related answer:

    The first alpha version has been released on July 2, 2015. The expected timeline for the release:

    This is the alpha release of version 9.5, indicating that some changes to features are still possible before release. The PostgreSQL Project will release 9.5 beta 1 in August, and then periodically release additional betas as required for testing until the final release in late 2015.

    Basics

    Of course, be sure not to overlook the basics: