I am fiddeling around with this dataset http://s3.cleverelephant.ca/postgis-workshop-2020.zip. It is used in this workshop http://postgis.net/workshops/postgis-intro/spatial_relationships.html.
I want to identify all the features, that do not have a subway station. I thought this spatial join is rather straight forward
SELECT
census.boroname,
COUNT(census.boroname)
FROM nyc_census_blocks AS census
JOIN nyc_subway_stations AS subway
ON ST_Disjoint(census.geom, subway.geom)
GROUP BY census.boroname;
However, the result set is waaaaay to large.
"Brooklyn" 4753693
"Manhattan" 1893156
"Queens" 7244123
"Staten Island" 2473146
"The Bronx" 2683246
When I run a test
SELECT COUNT(id) FROM nyc_census_blocks;
I get 38794
as a result. So there are way less features in nyc_census_blocks
than I have in the result-set from the spatial join.
Why is that? Where is the mistake I am making?
The problem is that with ST_Disjoint
you're getting for every record of nyc_census_block
the total number of stations that are disjoint with nyc_subway_stations
, which means in case of no intersection all records of nyc_subway_stations
(491). That's why you're getting such a high count.
Alternatively you can count how many subways and census blocks do intersect, e.g. in a CTE or subquery, and in another query count how many of them return 0
:
WITH j AS (
SELECT
gid,census.boroname,
(SELECT count(*)
FROM nyc_subway_stations subway
WHERE ST_Intersects(subway.geom,census.geom)) AS qt
FROM nyc_census_blocks AS census
)
SELECT boroname,count(*)
FROM j WHERE qt = 0
GROUP BY boroname;
boroname | count
---------------+-------
Brooklyn | 9517
Manhattan | 3724
Queens | 14667
Staten Island | 5016
The Bronx | 5396
(5 rows)