Search code examples
sqlpostgresqlpostgis

How to improve this chain of SELECT queries?


I have this functioning query that builds a weighted average of the polygons in a reference table that intersect the polygons in a master table.

Is there a better way to structure the query than a set of nested selects? Both in terms of performance and readability. I considered using a series of temporary tables but that seems like it would be more complicated rather than less.

UPDATE tmp_master_geom AS master
SET weighted_value = sub_query.weighted_average
FROM
    -- calc weighted average column
    (SELECT
        the_id,
        (weighted_sum / total_area) AS weighted_average
        -- the_whole_intersection AS geom
    FROM
         -- group intersection pieces by original id
        (SELECT
            the_id,
            st_union(the_intersection) AS the_whole_intersection,
            SUM(the_area) AS total_area,
            SUM(the_area * ref_value) AS weighted_sum
        FROM
            -- get all intersections between master and reference
            (SELECT
                tmg.ID AS the_id,
                st_astext(trgl.geom) AS the_original,
                st_astext(st_intersection(tmg.geom, trgl.geom)) AS the_intersection,
                st_area(st_intersection(tmg.geom, trgl.geom)) AS the_area,
                trgl.ref_value AS ref_value
            FROM
                tmp_master_geom tmg,
                tmp_ref_geoms_larger trgl
            where st_intersects(tmg.geom, trgl.geom)
        ) AS intersection_table
    GROUP BY 
          the_id
    ) AS sum_table
    ) AS sub_query
WHERE
    master.id = sub_query.the_id
;

My next step is to convert this from a query into a reusable function so suggestions with that in mind are especially appreciated.


Solution

  • Is there a better way to structure the query than a set of nested selects? Both in terms of performance and readability.

    If you're concerned about the readability of your code, I'd say you've got nothing to worry about. But if your main concern is performance, you might wanna EXPLAIN your query to identify possible bottlenecks.

    I considered using a series of temporary tables but that seems like it would be more complicated rather than less.

    Depending on the resultset size, temporary tables might add some performance to your code! If you're dealing with really large tables, creating a temporary table and indexing the columns you're searching can be faster than querying over subselects or CTEs.

    My next step is to convert this from a query into a reusable function so suggestions with that in mind are especially appreciated.

    If you're planing to pack the whole thing into a function, you can use this structure:

    CREATE OR REPLACE FUNCTION gimme_the_geom() 
    RETURNS GEOMETRY AS $BODY$
    DECLARE geom GEOMETRY;
    BEGIN
    --your huge query goes here + INTO geom;
    SELECT 'POINT(1 2)'::GEOMETRY INTO geom; 
    RETURN geom; 
    END;
    $BODY$
    LANGUAGE plpgsql;
    

    Usage

    SELECT gimme_the_geom();
    
                   gimme_the_geom               
    --------------------------------------------
     0101000000000000000000F03F0000000000000040
    (1 Zeile)