I have a table in BigQuery that has specific lat/long locations around the world. I want to be able to join to a table and get the Continent, Country, City and Suburb of that geopoint.
How can I do that?
You need some table with the country data, that describes the countries, cities etc as polygons. Note that this might be complicated depending on your needs - different jurisdictions might have different ideas about country boundaries.
Once you have these - you can join with your data using ST_Intesects
condition, like ST_Intersects(reference_geography, ST_GeogPoint(lon, lat))
(note lon:lat coordinate order, rather than lat:lot frequently used in other contexts).
One such dataset you might use is Open Street Maps data in BigQuery public dataset. Here is how to query a single point:
CREATE TEMP FUNCTION tag_value(tags ARRAY<STRUCT<key STRING, value STRING>>, name STRING)
RETURNS STRING AS (
(SELECT value FROM UNNEST(tags) WHERE key = name)
);
SELECT
tag_value(all_tags, 'admin_level') as admin_level,
tag_value(all_tags, 'name') as name
FROM (
SELECT all_tags
FROM `bigquery-public-data.geo_openstreetmap.planet_features_multipolygons`
WHERE ST_Intersects(geometry, ST_GeogPoint(-122.193476, 47.696147))
AND tag_value(all_tags, 'admin_level') IS NOT NULL
)
ORDER BY admin_level;
Result:
Row admin_level name
1 2 United States
2 4 Washington
3 6 King County
4 8 Kirkland
Or join with your data
CREATE TEMP FUNCTION tag_value(
tags ARRAY<STRUCT<key STRING, value STRING>>, name STRING)
RETURNS STRING AS (
(SELECT value FROM UNNEST(tags) WHERE key = name)
);
WITH reference_data AS (
SELECT
CAST(tag_value(all_tags, 'admin_level') AS INT64) as admin_level,
tag_value(all_tags, 'name') as name,
geometry
FROM (
SELECT all_tags, geometry
FROM `bigquery-public-data.geo_openstreetmap.planet_features_multipolygons`
WHERE SAFE_CAST(tag_value(all_tags, 'admin_level') AS INT64) IS NOT NULL
)
), my_points AS (
SELECT 1 as id, ST_GeogPOINT(90.3781398, 23.766686606993) as p
UNION ALL
SELECT 2 as id, ST_GeogPOINT(-74.013506, 40.7130162636655) as p
)
SELECT id, admin_level, name
FROM reference_data JOIN my_points
ON ST_Intersects(geometry, p)
ORDER BY id, admin_level;
One current restriction of BigQuery - you cannot use outer join here (e.g. LEFT JOIN) - the performance would be much worse.
id admin_level name
1 2 বাংলাদেশ
1 4 ঢাকা বিভাগ
1 5 ঢাকা জেলা
2 2 United States
2 4 New York
2 5 New York
2 6 New York County
2 7 Manhattan
2 10 Manhattan Community Board 1
Being open source volunteer-maintained data, the quality of OSM dataset varies among locations, so check data before usage.