Search code examples
sqlsql-servergeography

SQL Server 2012 geography - find minimum of multiple rows


I am trying to find the distance to the nearest bike rack to the location of a crime. I have a database table with 5 million crimes and 5,000 bike racks. What I need to do now is find the nearest bike rack to each of the 5 million crimes. I tried creating a python script using geopy but it was simply too time consuming in processing (would have taken weeks at least). I have been searching online extensively, I have found that SQL Server 2012 has a geography datatype so I created a column containing that geography datatype from the latitude and longitude points of crimes and bike racks.

Now I am trying to do something like this:

update Crimes set distanceToNearestBikeRack = crimeGeo.STDistance((select geo from bike_racks))

Unfortunately, this would cause the subquery to return many rows and that throws:

Msg 512, Level 16, State 1, Line 1 Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression. The statement has been terminated.

How can I find the nearest point from a multitude of points?


Solution

  • Normally you would use MIN to find a minimum of a set of values, but it appears it doesn't play well with spatial methods. This works, however:

    UPDATE
       Crimes
    SET
        distanceToNearestBikeRack = 
            (SELECT TOP 1 crimeGeo.STDistance(BR.geo)
            FROM bike_racks BR
            ORDER BY crimeGeo.STDistance(BR.geo) ASC)
    

    Of course, with five million crimes and five thousand bike racks, this could take... a while. So you'll probably want to look into batching up the updates.