I am trying without success to calculate building heights in my city using the LIDAR satellite dataset.
System specs
Postgres setup
I am using the latest version of Postgres v12.1 database with PostGIS with the following tweaks recommended in different sources:
shared_buffers = 256MB
maintenance_work_mem = 4GB
max_parallel_maintenance_workers = 7
max_parallel_workers = 7
max_wal_size = 60GB
min_wal_size = 90MB
random_page_cost = 1.0
Database setup
CREATE INDEX lidar_idx ON lidar USING GIST (geom);
Height calculation
Currently in order to calculate the height of a building, it is necessary to check if each one of the 3000 million points (rows) is inside the area of each building and calculate the average of all the points found inside a building area.
The queries I have tried are taking forever (probably more than 5 days or even more) and I would like to simplify the query so that I can get the height of the building with a lot less points, without having to compare with all the insane 3000 million records each time for each building.
In example:
ST_Within(l.geom, e.geom)
), and once those 100 records are found, pass to the next building.My main query is
SELECT e.gid, AVG(l.z) AS height
FROM lidar l,
buildings e
WHERE ST_Within(l.geom, e.geom)
GROUP BY e.gid) t
I have tried with another query, but I can not get it to work.
SELECT e.gid, AVG(l.z), COUNT(1) FILTER (WHERE ST_Within(l.geom, e.geom)) AS gidc
FROM lidar l, buildings e
WHERE gidc < 100
GROUP BY e.gid
I don't think you really want to do this at all. You should first try to make the correct query faster rather than compromising correctness by working with an arbitrary (but not random) subset of the data.
But if you do want it, then you can use a lateral join.
SELECT e.gid from
buildings e cross join lateral
(select AVG(l.z) AS height FROM lidar l WHERE ST_Within(l.geom, e.geom) LIMIT 100)
it is necessary to check if each one of the 3000million points (rows) is inside the area of each building and calculate the average of all the points found inside a building area.
This is exactly what a geometry index is for. You don't need to look at every point to get just the ones inside the a building area. If you don't have the right index, such as on lidar using gist (geom)
, then the lateral join query will also be awful.