Search code examples
sqlpostgresqloptimizationpostgislateral-join

How to optimize sql query with subqueries, perhaps by lateral join?


I'm trying to optimize sophisticated sql query, it would be executed on each map bonding box change. I thought that INNER LATERAL JOIN would be fastest but it isn't. Does anybody know how to speed up this query and how to make better use of LATERAL JOIN?

The fastest query I've made:

SELECT r0."id", r0."name" 
FROM "hiking"."routes" AS r0 
INNER JOIN "hiking"."hierarchy" AS h1 ON r0."id" = h1."parent" 
INNER JOIN (SELECT DISTINCT unnest(s0."rels") AS "rel" 
            FROM "hiking"."segments" AS s0 
            WHERE (ST_Intersects(s0."geom", ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)), 3857)))) AS s2 ON TRUE 
WHERE (s2."rel" = h1."child");

Planning time: ~0.605 ms Execution time: ~37.232 ms

Actually the same as above but with LATERAL JOIN, is it correct that it is slower?

SELECT r0."id", r0."name" 
FROM "hiking"."routes" AS r0 
INNER JOIN "hiking"."hierarchy" AS h1 ON r0."id" = h1."parent" 
INNER JOIN LATERAL (SELECT DISTINCT unnest(s0."rels") AS "rel" 
                    FROM "hiking"."segments" AS s0 
                    WHERE (ST_Intersects(s0."geom", ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)), 3857)))) AS s2 ON TRUE 
WHERE (s2."rel" = h1."child");

Planning time: ~1.353 ms Execution time: ~38.518 ms

Slowest query with subquery in subquery (that was my first so I've improved it a bit):

SELECT r0."id", r0."name" 
FROM "hiking"."routes" AS r0 
INNER JOIN (SELECT DISTINCT h0."parent" AS "parent" 
            FROM "hiking"."hierarchy" AS h0 
            INNER JOIN (SELECT DISTINCT unnest(s0."rels") AS "rel" 
                        FROM "hiking"."segments" AS s0 
                        WHERE (ST_Intersects(s0."geom", ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)), 3857)))) AS s1 ON TRUE 
            WHERE (h0."child" = s1."rel")) AS s1 ON TRUE 
WHERE (r0."top" AND (r0."id" = s1."parent"));

Planning time: ~1.017 ms Execution time: ~41.288 ms


Solution

  • It is hard to reproduce the logic of your query without any knowledges about your DB but I will try, so be patient:

    SELECT r0."id", r0."name" 
    FROM "hiking"."routes" AS r0 
    INNER JOIN "hiking"."hierarchy" AS h1 ON r0."id" = h1."parent" 
    WHERE 
      EXISTS (
        SELECT 1
        FROM "hiking"."segments" AS s0 
        WHERE (
          ST_Intersects(
            s0."geom",
            ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)),
            3857)))
          AND array[h1."child"] <@ s0."rels");
    

    There are two points:

    1. Filtering data by EXISTS or NOT EXISTS sometimes faster then by joining
    2. Instead of unnesting array field to compare its elements with some value you could to use the array comparison operator(s). Having appropriate GIN index it is much faster (docs here and here).

    Here is simple example how to use indexes on arrays and how its faster:

    create table foo(bar int[]);
    insert into foo(bar) select array[1,2,3,x] from generate_series(1,1000000) as x;
    create index idx on foo using gin (bar); // Note this
    select * from foo where 666 in (select unnest(bar)); // 6936,345 ms on my HW
    select * from foo where array[666] <@ bar; // 45,524 ms