Is it possible to visualize (latitude, longitude) data on a world map in an Azure Databricks Notebook? Scatter plot is the closest I got. The documentation tells about the existence of Map (Markers)
, but I didn't find such. Other possibilities are also welcome.
Here is the example of loading geospatial data from the Databricks doc:
%sql
CREATE TABLE IF NOT EXISTS default.sf_fire_incidents
USING com.databricks.spark.csv
OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/sf-fire/sf-fire-incidents.csv',
HEADER "true",
INFERSCHEMA "true");
SELECT location,
`Estimated Property Loss`,
coordinates[0] AS LAT,
coordinates[1] AS LONG
FROM (SELECT location,
`Estimated Property Loss`,
split(btrim(location, '() '), ', ') AS coordinates
FROM default.sf_fire_incidents
ORDER BY `Estimated Property Loss` DESC,
location)
LIMIT 2000;
Scatter plot (LAT, LONG)
is the closest I have, but I am unable to show the positions on a world map.
I think your best bet would be using the python library folium.
link to docs : https://python-visualization.github.io/folium/quickstart.html#Markers
You will start by initializing your map object.
import folium
# location refers to where the map will be centered at the begining
m = folium.Map(location=[45.372, -121.6972], zoom_start=12, tiles="Stamen Terrain")
Then convert your query result to pandas dataframe, you can do for example :
YOUR_QUERY = """
SELECT location,
`Estimated Property Loss`,
coordinates[0] AS LAT,
coordinates[1] AS LONG
FROM (SELECT location,
`Estimated Property Loss`,
split(btrim(location, '() '), ', ') AS coordinates
FROM default.sf_fire_incidents
ORDER BY `Estimated Property Loss` DESC,
location)
LIMIT 2000;
"""
pdf = spark.sql(YOUR_QUERY).toPandas()
And finally iterate over your rows and add your markers to the map object like such :
for index, lines in pdf.iterrows():
folium.Marker([lines[LAT], lines[LONG]], popup=f"{index}").add_to(m)
Don't forget to show the map at end by just calling the object :
m