Suppose I have a table called gps_points in Big query:
ID | lon | lat
and using OSM, for each point in gps_points table, I want to see what is the road type (motorway...)
The main challenge is that the points in gps_points may not be exactly the same as those in OSM.
For example, running this query returns only 1 point:
with points as
(
select st_geogpoint(lon, lat) gpoint FROM `prj-test.gps_points` LIMIT 1000
)
select gpoint from points inner join `bigquery-public-data.geo_openstreetmap.planet_features` osm
ON ST_CONTAINS(geometry, gpoint)
WHERE 'highway' IN (SELECT key FROM UNNEST(all_tags)) limit 1000
I am struggling in:
ON ST_CONTAINS(geometry, gpoint)
. I can add WHERE ST_Distance(geometry, gpoint) < 5
for example, but that only works if I do cross join. My tables are huge and I am not sure if cross join is the best answer.
Is there an efficient way to do this kind of lookup?
For BigQuery there is not much difference between
INNER JOIN ... ON ST_CONTAINS(geometry, gpoint)
,
CROSS JOIN ... WHERE ST_Distance(geometry, gpoint) < 5
,
or (also legal)
INNER JOIN ... ON ST_Distance(geometry, gpoint) < 5
.
They are optimized similarly by the query planner, and you get optimized spatial join as described in the doc: https://cloud.google.com/bigquery/docs/geospatial-data#using_joins_with_spatial_data
Also, even though that doc does not mention ST_Distance(...) < ...
, it works with this function too! Such condition is rewritten automatically as ST_DWithin(geometry, gpoint, 5)
by the query planner. You'll see CROSS JOIN ON in the Execution Details, but the join is still optimized. See https://mentin.medium.com/execution-details-for-spatial-join-7274356e0115 to check if it works in specific case.