Search code examples
google-bigquerypostgisgeohashing

Geospatial rolling average from geohashes on BigQuery


Geohashes form a grid, much like a matrix.

If I have a table with a geohash7 and a value, how could I calculate say the average of all the adjacentcells on BigQuery?

Sample source table

geohash7 value
r1q6fh3 7
r1q6fh6 2
r1q6fh7 5
r1q6fh1 7
r1q6fh4 2
r1q6fh5 6
r1q6f5c 4
r1q6f5f 2
r1q6f5g 7

Sample output table

geohash7 value
r1q6fh4 4.6
... ...

Ideally in a scalable way as I want to run this over a large landmass.


Solution

  • First you'll need a function that returns neighbors. It is probably easier to grab one of javascript geohash libraries and use it as UDF, rather than do bit fiddling in SQL.

    Once you have a function that returns neighbors, say

    GeoHashNeibors(center STRING) RETURNS ARRAY<STRING>

    You can then join original table with the flattened list of neighbors and compute average. Something like

    WITH with_neighbors AS (
      select geohash, value, neighbor
      from data d, unnest(GeoHashNeibors(d.geohash)) neighbor
    ),
    joined AS (
      select t1.geohash, t1.value, t1.neighbor, t2.value as n_value
      from with_neighbors t1 left join data t2
      on t1.neighbor = t2.geohash
    )
    SELECT geohash, ANY_VALUE(value) value, AVG(n_value) AS avg_neighbor
    FROM joined 
    GROUP BY 1