apache-spark apache-spark-sql databricks geospatial azure-databricks

Databricks geospatial (latitude/longitude) plot

Is it possible to visualize (latitude, longitude) data on a world map in an Azure Databricks Notebook? Scatter plot is the closest I got. The documentation tells about the existence of Map (Markers), but I didn't find such. Other possibilities are also welcome.

Here is the example of loading geospatial data from the Databricks doc:

%sql
CREATE TABLE IF NOT EXISTS default.sf_fire_incidents
USING com.databricks.spark.csv
OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/sf-fire/sf-fire-incidents.csv',
         HEADER "true",
         INFERSCHEMA "true");

SELECT location,
   `Estimated Property Loss`,
   coordinates[0]  AS LAT,
   coordinates[1] AS LONG
 FROM (SELECT location,
        `Estimated Property Loss`,
         split(btrim(location, '() '), ', ') AS coordinates
       FROM default.sf_fire_incidents
       ORDER BY `Estimated Property Loss` DESC,
                location)
 LIMIT 2000;

Scatter plot (LAT, LONG) is the closest I have, but I am unable to show the positions on a world map.

Solution

I think your best bet would be using the python library folium.
link to docs : https://python-visualization.github.io/folium/quickstart.html#Markers

You will start by initializing your map object.

import folium
# location refers to where the map will be centered at the begining
m = folium.Map(location=[45.372, -121.6972], zoom_start=12, tiles="Stamen Terrain")

Then convert your query result to pandas dataframe, you can do for example :

YOUR_QUERY = """
SELECT location,
   `Estimated Property Loss`,
   coordinates[0]  AS LAT,
   coordinates[1] AS LONG
 FROM (SELECT location,
        `Estimated Property Loss`,
         split(btrim(location, '() '), ', ') AS coordinates
       FROM default.sf_fire_incidents
       ORDER BY `Estimated Property Loss` DESC,
                location)
 LIMIT 2000;
"""
pdf = spark.sql(YOUR_QUERY).toPandas()

And finally iterate over your rows and add your markers to the map object like such :

for index, lines in pdf.iterrows():
 folium.Marker([lines[LAT], lines[LONG]], popup=f"{index}").add_to(m)

Don't forget to show the map at end by just calling the object :