Search code examples
sql-server-2008cursorsetspatial

How can I do this 'for each' code in SQL Server 2008 without using cursors?


I wish to do the following pseudo-code in SQL without the use of a CURSOR, if possible.

for each zipcode
{
    -- What city is this zipcode in? A zipcode can be in multiple cities, 
    -- but one city area is always way greater that the others.
    -- Eg. 90210 is 96% in the city of Beverly Hills. 
    -- The remaining 4% is that it s in some other fringe cties .. 
    --but only just (some mismatch mapping errors).

    ** grab the city largest shape area which this zip is a part of **
}

Now, I have some SQL to help us out.

-- All zipcodes and each boundary/shapefile.
SELECT ZipCodeId, Boundary FROM ZipCodes -- Boundary is a GEOGRAPHY field type.

To determine if a zipcode boundary is in a city....
SELECT CityId, CityName, 
    City.Boundary.Intersection(@someZipCodeBoundary).STArea() AS Area
FROM Cities
WHERE City.Boundary.Intersects(@someZipCodeBoundary) = 1

and to get the area of intersection (because we want the highest area of intersection ie. TOP(1) ORDER BY Area DESC or a DISTINCT with an ORDER BY sort of thing. We use the Intersection SQL method.

Note: Intersects and Intersection are two different Sql methods.


Solution

  • Ok - got it ;) The trick was to use a PARTITION BY. @In Sane gave me the idea when I realised I've done something similar, before :)

    So .. here we go..

    SELECT 
        a.CityId, a.Name, b.ZipCodeId, b.Code, a.Boundary.STIntersection(b.Boundary).STArea() AS Area, 
        RANK() OVER (PARTITION BY b.ZipCodeId ORDER BY a.Boundary.STIntersection(b.Boundary).STArea() DESC) AS Area_Rank
    FROM 
        Cities a
        INNER JOIN ZipCodes b on a.Boundary.STIntersects(b.Boundary) = 1
        where b.Code = 12010 OR b.Code = 90210
    
    
    CityId      Name                                                                                                 ZipCodeId   Code                 Area                   Area_Rank
    ----------- ---------------------------------------------------------------------------------------------------- ----------- -------------------- ---------------------- --------------------
    2166        Los Angeles                                                                                          9331        90210                13235413.8430175       1
    1686        Beverly Hills                                                                                        9331        90210                10413397.1372613       2
    2635        West Hollywood                                                                                       9331        90210                0                      3
    14570       Amsterdam                                                                                            29779       12010                15369521.9602067       1
    14921       Hagaman                                                                                              29779       12010                1394562.70390667       2
    14856       Fort Johnson                                                                                         29779       12010                211058.884834718       3
    14651       Broadalbin                                                                                           29779       12010                0                      4
    
    (7 row(s) affected)
    

    So in this filtered example (Filtered by ZipCode 12010 or 90210), we can see that this zipcode exists in 4 different cities/towns. Each zipcode can have 1 to many results, which are then ordered by the Area value .. but the key here is the PARTITION keyword .. which does this ordering by ZipCode groups or partitions. Very funky :) Notice how the zipcode 90210 has it's own rank results? same with 12010 ?

    Next, we make that a subquery, and just grab all the Rank == 1 :)

    SELECT CityId, Name, ZipCodeId, Code, Area_Rank
    FROM (
    SELECT 
        a.CityId, a.Name, b.ZipCodeId, b.Code, a.Boundary.STIntersection(b.Boundary).STArea() AS Area, 
        RANK() OVER (PARTITION BY b.ZipCodeId ORDER BY a.Boundary.STIntersection(b.Boundary).STArea() DESC) AS Area_Rank
    FROM 
        Cities a
        INNER JOIN ZipCodes b on a.Boundary.STIntersects(b.Boundary) = 1
        where b.Code = 12010 OR b.Code = 90210
    ) subQuery
    WHERE Area_Rank = 1
    
    
    CityId      Name                                                                                                 ZipCodeId   Code                 Area_Rank
    ----------- ---------------------------------------------------------------------------------------------------- ----------- -------------------- --------------------
    14570       Amsterdam                                                                                            29779       12010                1
    2166        Los Angeles                                                                                          9331        90210                1
    
    (2 row(s) affected)
    

    Sweet as candy :)

    Side Note: This also shows me that my Los Angeles city shapefile/boundary is corrupted, because it's intersecting the zipcode 90210 far too much (which I visually confirmed :P)

    alt text