Search code examples
sqlpostgresqlpostgis

How can I set data from on table to another according spatial relation geometries in these tables


I have two tables with data. I need to set data building_h from one table to another with respect to the spatial relationships between data in these tables. Spatial data in the source table is polygon type and in destination table - points. I made a query for points and polygons which intersect. It works nice. But there are some points that do not intersect with polygons. My idea is to define for each of these points the closest polygon and take the building_h value from it. For this purpose, I wrote the function which takes the id of the point and returns the building_h value. And it works ok in the test.

CREATE OR REPLACE FUNCTION closest_pol(int4)
RETURNS NUMERIC 
AS
$$
DECLARE
    retVal NUMERIC;
BEGIN
    SELECT bgs.building_h INTO retVal
                FROM buildings_geoalert_spgg bgs, building_from_landuse_spgg bfl 
                WHERE ST_INTERSECTS(bgs.geom, ST_BUFFER(bfl.geom_centr, 0.0006,'quad_segs=8')) AND bfl.id = $1
                ORDER BY ST_INTERSECTION(bgs.geom, ST_BUFFER(bfl.geom_centr, 0.0006,'quad_segs=8')) ASC 
                LIMIT 1;
RETURN retVal;
END;
$$
LANGUAGE plpgsql 
   STABLE 
RETURNS NULL ON NULL INPUT;

But when I run the query for the whole table, it executes forever.. (for ~3000 rows with NULL value). It lasted several hours before I stopped it.

UPDATE building_from_landuse_spgg AS bfl SET 
    building_h = (SELECT closest_pol(bfl.id))
    WHERE bfl.building_h IS NULL;

Do you have an idea what I did wrong?


Solution

  • In your WHERE clause you're using ST_Intersects with 0.006 buffer that is created in query time. Consider using a partial gist index with this buffer of yours (if for whatever reason you must use them):

    CREATE INDEX idx_buffer ON building_from_landuse_spgg USING gist (ST_Buffer(geom_centr, 0.0006,'quad_segs=8'));
    

    I'd argue that you do not need a function for that, since you could use the query inside of the function as a subquery in your UPDATE statement. But if you have your reasons to stick to the function, you can get rid of the SELECT to call it:

    UPDATE building_from_landuse_spgg 
    SET building_h = closest_pol(id)
    WHERE building_h IS NULL;
    

    EDIT: As correctly mentioned by @JGH (see comments), using buffers is less efficient than simply using ST_DWithin. So, if you can afford it, create an index in the geometries used in the WHERE clause ..

    CREATE INDEX idx_landuse_geom_centr ON building_from_landuse_spgg USING gist (geom_centr);
    CREATE INDEX idx_geoalert_geom ON buildings_geoalert_spgg USING gist (geom);
    

    .. and also partially indexing building_h would speed up things a bit, since you're only interested in the NULL records:

    CREATE INDEX idx_landuse_building_h ON building_from_landuse_spgg (building_h) 
    WHERE building_h IS NULL;
    

    Or if you prefer a broader index but still putting NULL values in the first class ..

    CREATE INDEX idx_landuse_building_h ON building_from_landuse_spgg 
      (building_h NULLS FIRST);
    

    And perhaps consider putting the code of your function in a subquery, e.g.

    UPDATE building_from_landuse_spgg AS bfl 
    SET building_h = (
      SELECT bgs.building_h 
      FROM   buildings_geoalert_spgg bgs 
      WHERE  ST_DWithin(bgs.geom, bfl.geom_centr, 0.0006)
      ORDER BY ST_Distance(bgs.geom, bfl.geom_centr) ASC LIMIT 1)
    WHERE bfl.building_h IS NULL;
    

    Further reading: