Search code examples
google-cloud-spanner

Querying for addresses within a given radius within Spanner


What it says on the tin: how do I query for addresses in my Spanner database which are within a certain radius of a given long and lat?

For an example use case, let's say I have a database of restaurants and I'm looking for ones within ten miles of my apartment. I currently have the lat and long for each restaurant stored in the Address_geolng and Address_geolat fields as degrees. For the sake of simple data, we'll say I'm somehow living in the middle of the hot restaurant scene that is null island (e.g. 0,0).

A lot of databases have a built-in geography type or some type of prebuilt geodistance functionality, but I don't see either one in Spanner.

I've been trying to just brute force implement the Haversine formula in lieu of all else, but in all honesty my eyes are crossing here and either I'm unable to track down the relevant documentation for my use case or Spanner is missing a lot of things to help implement this more simply. (For example, it appears that their trig functions work solely in radians, but I don't see any reference anywhere to either a degree to radian conversion function or the ability to reference PI... there's gotta be something better than just grabbing ACOS(-1), I'm sure....)

So far the best effort I've got is

    COS(0) * COS(DIV(ACOS(-1),180) * Address_geolat) *
    SIN(DIV(DIV(ACOS(-1),180) * (Address1_geolng - 0)), 2) * SIN(DIV(DIV(ACOS(-1),180) * (Address1_geolng - 0), 2)) AS a FROM restaurants WHERE (3959 * 2 * ATAN2(SQRT(a), SQRT(1 - a)) <= 10)

Which I'm positive isn't even right -- my eyes are just crossing trying to sort through all of this.

Has anyone already developed a solution for this? What did you use?


Solution

  • So I am working on publishing a doc for this. You are correct that Spanner does not have geospatial support internally, but here are some tips:

    1) don't query using haversine at the top level select - it means that you have to do a full table scan over all rows with complex calculations on each one, so will be very slow on large tables

    2) start by calculating the corner coordinates of a bounding rectangle that has sides of 20 miles with your requested coordinates in the center.

    3) query for addresses where the lat-long is inside your bounding box using simple >/< operators comparing the lat-long to the corner points. As this is a simple query you can take advantage of secondary indexes on latitude and longitude to make your query much faster... (Take care at the poles and across 180° longitude!)

    4) you now have a limited set of addresses that are approximately 20miles from your requested position (some are slightly further away) You can now filter these addresses by calculating the exact distance using either haversine or spherical law of cosine

    This fine distance calculation / filtering can be done in SQL, but it may be easier to do it in your application where you have more math functions available and can use local variables to simplify things. As you only have a few rows to work with (due to the coarse filtering on the bounding box) this should be quick.

    Here is a useful web page with easier to read formulas: https://www.movable-type.co.uk/scripts/latlong.html