I have two tables with data. I need to set data building_h from one table to another with respect to the spatial relationships between data in these tables. Spatial data in the source table is polygon type and in destination table - points. I made a query for points and polygons which intersect. It works nice. But there are some points that do not intersect with polygons. My idea is to define for each of these points the closest polygon and take the building_h value from it. For this purpose, I wrote the function which takes the id of the point and returns the building_h value. And it works ok in the test.
CREATE OR REPLACE FUNCTION closest_pol(int4)
RETURNS NUMERIC
AS
$$
DECLARE
retVal NUMERIC;
BEGIN
SELECT bgs.building_h INTO retVal
FROM buildings_geoalert_spgg bgs, building_from_landuse_spgg bfl
WHERE ST_INTERSECTS(bgs.geom, ST_BUFFER(bfl.geom_centr, 0.0006,'quad_segs=8')) AND bfl.id = $1
ORDER BY ST_INTERSECTION(bgs.geom, ST_BUFFER(bfl.geom_centr, 0.0006,'quad_segs=8')) ASC
LIMIT 1;
RETURN retVal;
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
But when I run the query for the whole table, it executes forever.. (for ~3000 rows with NULL value). It lasted several hours before I stopped it.
UPDATE building_from_landuse_spgg AS bfl SET
building_h = (SELECT closest_pol(bfl.id))
WHERE bfl.building_h IS NULL;
Do you have an idea what I did wrong?
In your WHERE
clause you're using ST_Intersects
with 0.006 buffer that is created in query time. Consider using a partial gist
index with this buffer of yours (if for whatever reason you must use them):
CREATE INDEX idx_buffer ON building_from_landuse_spgg USING gist (ST_Buffer(geom_centr, 0.0006,'quad_segs=8'));
I'd argue that you do not need a function for that, since you could use the query inside of the function as a subquery in your UPDATE
statement. But if you have your reasons to stick to the function, you can get rid of the SELECT
to call it:
UPDATE building_from_landuse_spgg
SET building_h = closest_pol(id)
WHERE building_h IS NULL;
EDIT: As correctly mentioned by @JGH (see comments), using buffers is less efficient than simply using ST_DWithin
. So, if you can afford it, create an index in the geometries used in the WHERE
clause ..
CREATE INDEX idx_landuse_geom_centr ON building_from_landuse_spgg USING gist (geom_centr);
CREATE INDEX idx_geoalert_geom ON buildings_geoalert_spgg USING gist (geom);
.. and also partially indexing building_h
would speed up things a bit, since you're only interested in the NULL
records:
CREATE INDEX idx_landuse_building_h ON building_from_landuse_spgg (building_h)
WHERE building_h IS NULL;
Or if you prefer a broader index but still putting NULL
values in the first class ..
CREATE INDEX idx_landuse_building_h ON building_from_landuse_spgg
(building_h NULLS FIRST);
And perhaps consider putting the code of your function in a subquery, e.g.
UPDATE building_from_landuse_spgg AS bfl
SET building_h = (
SELECT bgs.building_h
FROM buildings_geoalert_spgg bgs
WHERE ST_DWithin(bgs.geom, bfl.geom_centr, 0.0006)
ORDER BY ST_Distance(bgs.geom, bfl.geom_centr) ASC LIMIT 1)
WHERE bfl.building_h IS NULL;
Further reading: