Search code examples
sqlgoogle-bigquerylocationgeospatiallatitude-longitude

Lat and Long in BigQuery, how do I find the country, continent, city and suburb?


I have a table in BigQuery that has specific lat/long locations around the world. I want to be able to join to a table and get the Continent, Country, City and Suburb of that geopoint.

How can I do that?


Solution

  • You need some table with the country data, that describes the countries, cities etc as polygons. Note that this might be complicated depending on your needs - different jurisdictions might have different ideas about country boundaries.

    Once you have these - you can join with your data using ST_Intesects condition, like ST_Intersects(reference_geography, ST_GeogPoint(lon, lat)) (note lon:lat coordinate order, rather than lat:lot frequently used in other contexts).

    One such dataset you might use is Open Street Maps data in BigQuery public dataset. Here is how to query a single point:

    CREATE TEMP FUNCTION tag_value(tags ARRAY<STRUCT<key STRING, value STRING>>, name STRING)
    RETURNS STRING AS (
      (SELECT value FROM UNNEST(tags) WHERE key = name)
    );
    
    SELECT 
      tag_value(all_tags, 'admin_level') as admin_level,
      tag_value(all_tags, 'name') as name
    FROM (
      SELECT all_tags 
      FROM `bigquery-public-data.geo_openstreetmap.planet_features_multipolygons`
      WHERE ST_Intersects(geometry, ST_GeogPoint(-122.193476, 47.696147))
        AND tag_value(all_tags, 'admin_level') IS NOT NULL
    )
    ORDER BY admin_level;
    

    Result:

    Row   admin_level    name
    1     2              United States
    2     4              Washington
    3     6              King County
    4     8              Kirkland
    

    Or join with your data

    CREATE TEMP FUNCTION tag_value(
        tags ARRAY<STRUCT<key STRING, value STRING>>, name STRING)
    RETURNS STRING AS (
      (SELECT value FROM UNNEST(tags) WHERE key = name)
    );
    
    WITH reference_data AS (
      SELECT 
        CAST(tag_value(all_tags, 'admin_level') AS INT64) as admin_level,
        tag_value(all_tags, 'name') as name,
        geometry
      FROM (
        SELECT all_tags, geometry
        FROM `bigquery-public-data.geo_openstreetmap.planet_features_multipolygons`
        WHERE SAFE_CAST(tag_value(all_tags, 'admin_level') AS INT64) IS NOT NULL
      )
    ), my_points AS (
      SELECT 1 as id, ST_GeogPOINT(90.3781398, 23.766686606993) as p
      UNION ALL
      SELECT 2 as id, ST_GeogPOINT(-74.013506, 40.7130162636655) as p
    )
    SELECT id, admin_level, name 
    FROM reference_data JOIN my_points
    ON ST_Intersects(geometry, p)
    ORDER BY id, admin_level;
    

    One current restriction of BigQuery - you cannot use outer join here (e.g. LEFT JOIN) - the performance would be much worse.

    id  admin_level name
    1   2   বাংলাদেশ
    1   4   ঢাকা বিভাগ
    1   5   ঢাকা জেলা
    2   2   United States
    2   4   New York
    2   5   New York
    2   6   New York County
    2   7   Manhattan
    2   10  Manhattan Community Board 1
    

    Being open source volunteer-maintained data, the quality of OSM dataset varies among locations, so check data before usage.