Search code examples
postgresqlapache-sparkapache-spark-sqldatabricksazure-databricks

How to get geospatial POINT using SparkSQL


I'm converting a process from postgreSQL over to DataBrick ApacheSpark,

The postgresql process uses the following sql function to get the point on a map from a X and Y value. ST_Transform(ST_SetSrid(ST_MakePoint(x, y),4326),3857)

Does anyone know how I can achieve this same logic in SparkSQL o databricks?


Solution

  • To achieve this you need to use some library, like, Apache Sedona, GeoMesa, or something else. Sedona, for example, has the ST_TRANSFORM function, maybe it has the rest as well.

    The only thing that you need to take care, is that if you're using pure SQL, then on Databricks you will need:

    • install Sedona libraries using the init script, so libraries should be there before Spark starts
    • set Spark configuration parameters, as described in the following pull request

    Update June 2022nd: people at Databricks developed the Mosaic library that is heavily optimized for geospatial analysis on Databricks, and it's compatible with standard ST_ functions.