Search code examples
sqlgoogle-bigquerygisuser-defined-functions

ST_EXTENT or ST_ENVELOPE in BigQuery?


I want the equivalent of ST_EXTENT or ST_ENVELOPE in BigQuery, but I can't find a way to make this query run:

SELECT REGEXP_EXTRACT(name, ', (..)') state
  , ST_EXTENT(ARRAY_AGG(urban_area_geom)) corners
  , COUNT(*) cities
FROM `bigquery-public-data.geo_us_boundaries.urban_areas`
GROUP BY state

The desired result of this query is a list of bounding boxes to cover all urban areas around the US, grouped by state.


Solution

  • I created a feature request to get a native implementation of ST_EXTENT(). Please add your votes and evidence of why you need this function so the team can prioritize and keep you informed of any developments:

    In the meantime, the best solution I can offer:

    • fhoffa.x.st_bounding_box(): a naive bounding box UDF.

    Use it like this:

    SELECT REGEXP_EXTRACT(name, ', (..)') state
      , fhoffa.x.st_bounding_box(ARRAY_AGG(urban_area_geom)).polygon 
      , COUNT(*) urban_areas
    FROM `bigquery-public-data.geo_us_boundaries.urban_areas`
    GROUP BY state
    

    enter image description here

    enter image description here The code behind it:

    CREATE OR REPLACE FUNCTION fhoffa.x.st_bounding_box(arr ANY TYPE) AS ((
      SELECT AS STRUCT *
        , ST_MakePolygon(ST_GeogFromText(FORMAT('LINESTRING(%f %f,%f %f,%f %f,%f %f)',minlon,minlat,maxlon,minlat,maxlon,maxlat,minlon, maxlat))) polygon
      FROM (
        SELECT MIN(m.min_x) minlon, MAX(m.max_x) maxlon , MIN(m.min_y) minlat, MAX(m.max_y) maxlat
        FROM (
          SELECT 
            (SELECT AS STRUCT MIN(x) min_x, MAX(x) max_x, MIN(y) min_y, MAX(y) max_y FROM UNNEST(coords)) m
          FROM (
            SELECT ARRAY(
              SELECT STRUCT(
                CAST(SPLIT(c, ', ')[OFFSET(0)] AS FLOAT64) AS x, 
                CAST(SPLIT(c, ', ')[OFFSET(1)] AS FLOAT64) AS y
              )
              FROM UNNEST(REGEXP_EXTRACT_ALL(ST_ASGEOJSON(geog), r'\[([^[\]]*)\]')) c
            ) coords
            FROM UNNEST(arr) geog
          )
        )
      )
    ))
    

    Notes:

    • Additional effort is needed to make it work with geometries that cross the -180 line.

    • Due to geodesic edges, the function result is not a true bounding box, i.e. ST_Covers(box, geom) might return FALSE.

    • In the picture above I'm not expecting each state to be fully covered, just its urban areas. So the bounding box is correct if there's no urban area in those uncovered corners.

    • The following polygon construction will give you exact "rectangles", but they become much more complex structures to work with.

    ST_GEOGFROMGEOJSON(
      FORMAT('{"type": "Polygon", "coordinates": [[[%f,%f],[%f,%f],[%f,%f],[%f,%f],[%f, %f]]]}'
      , minlon,minlat,maxlon,minlat,maxlon,maxlat,minlon,maxlat,minlon,minlat)
    )
    
    

    enter image description here

    I'll be looking forward to your comments and suggestions.