I wish to do the following pseudo-code in SQL without the use of a CURSOR
, if possible.
for each zipcode
{
-- What city is this zipcode in? A zipcode can be in multiple cities,
-- but one city area is always way greater that the others.
-- Eg. 90210 is 96% in the city of Beverly Hills.
-- The remaining 4% is that it s in some other fringe cties ..
--but only just (some mismatch mapping errors).
** grab the city largest shape area which this zip is a part of **
}
Now, I have some SQL to help us out.
-- All zipcodes and each boundary/shapefile.
SELECT ZipCodeId, Boundary FROM ZipCodes -- Boundary is a GEOGRAPHY field type.
To determine if a zipcode boundary is in a city....
SELECT CityId, CityName,
City.Boundary.Intersection(@someZipCodeBoundary).STArea() AS Area
FROM Cities
WHERE City.Boundary.Intersects(@someZipCodeBoundary) = 1
and to get the area of intersection (because we want the highest area of intersection ie. TOP(1) ORDER BY Area DESC
or a DISTINCT with an ORDER BY
sort of thing. We use the Intersection
SQL method.
Note: Intersects
and Intersection
are two different Sql methods.
Ok - got it ;) The trick was to use a PARTITION BY
. @In Sane gave me the idea when I realised I've done something similar, before :)
So .. here we go..
SELECT
a.CityId, a.Name, b.ZipCodeId, b.Code, a.Boundary.STIntersection(b.Boundary).STArea() AS Area,
RANK() OVER (PARTITION BY b.ZipCodeId ORDER BY a.Boundary.STIntersection(b.Boundary).STArea() DESC) AS Area_Rank
FROM
Cities a
INNER JOIN ZipCodes b on a.Boundary.STIntersects(b.Boundary) = 1
where b.Code = 12010 OR b.Code = 90210
CityId Name ZipCodeId Code Area Area_Rank
----------- ---------------------------------------------------------------------------------------------------- ----------- -------------------- ---------------------- --------------------
2166 Los Angeles 9331 90210 13235413.8430175 1
1686 Beverly Hills 9331 90210 10413397.1372613 2
2635 West Hollywood 9331 90210 0 3
14570 Amsterdam 29779 12010 15369521.9602067 1
14921 Hagaman 29779 12010 1394562.70390667 2
14856 Fort Johnson 29779 12010 211058.884834718 3
14651 Broadalbin 29779 12010 0 4
(7 row(s) affected)
So in this filtered example (Filtered by ZipCode 12010 or 90210), we can see that this zipcode exists in 4 different cities/towns. Each zipcode can have 1 to many results, which are then ordered by the Area value .. but the key here is the PARTITION
keyword .. which does this ordering by ZipCode groups or partitions. Very funky :) Notice how the zipcode 90210 has it's own rank results? same with 12010 ?
Next, we make that a subquery, and just grab all the Rank == 1 :)
SELECT CityId, Name, ZipCodeId, Code, Area_Rank
FROM (
SELECT
a.CityId, a.Name, b.ZipCodeId, b.Code, a.Boundary.STIntersection(b.Boundary).STArea() AS Area,
RANK() OVER (PARTITION BY b.ZipCodeId ORDER BY a.Boundary.STIntersection(b.Boundary).STArea() DESC) AS Area_Rank
FROM
Cities a
INNER JOIN ZipCodes b on a.Boundary.STIntersects(b.Boundary) = 1
where b.Code = 12010 OR b.Code = 90210
) subQuery
WHERE Area_Rank = 1
CityId Name ZipCodeId Code Area_Rank
----------- ---------------------------------------------------------------------------------------------------- ----------- -------------------- --------------------
14570 Amsterdam 29779 12010 1
2166 Los Angeles 9331 90210 1
(2 row(s) affected)
Sweet as candy :)
Side Note: This also shows me that my Los Angeles city shapefile/boundary is corrupted, because it's intersecting the zipcode 90210 far too much (which I visually confirmed :P)