Search code examples
apache-sparkapache-spark-sqldatabricksgeospatialazure-databricks

Databricks geospatial (latitude/longitude) plot


Is it possible to visualize (latitude, longitude) data on a world map in an Azure Databricks Notebook? Scatter plot is the closest I got. The documentation tells about the existence of Map (Markers), but I didn't find such. Other possibilities are also welcome.

Here is the example of loading geospatial data from the Databricks doc:

%sql
CREATE TABLE IF NOT EXISTS default.sf_fire_incidents
USING com.databricks.spark.csv
OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/sf-fire/sf-fire-incidents.csv',
         HEADER "true",
         INFERSCHEMA "true");

SELECT location,
   `Estimated Property Loss`,
   coordinates[0]  AS LAT,
   coordinates[1] AS LONG
 FROM (SELECT location,
        `Estimated Property Loss`,
         split(btrim(location, '() '), ', ') AS coordinates
       FROM default.sf_fire_incidents
       ORDER BY `Estimated Property Loss` DESC,
                location)
 LIMIT 2000;

Scatter plot (LAT, LONG) is the closest I have, but I am unable to show the positions on a world map.


Solution

  • I think your best bet would be using the python library folium.
    link to docs : https://python-visualization.github.io/folium/quickstart.html#Markers

    You will start by initializing your map object.

    import folium
    # location refers to where the map will be centered at the begining
    m = folium.Map(location=[45.372, -121.6972], zoom_start=12, tiles="Stamen Terrain")
    

    Then convert your query result to pandas dataframe, you can do for example :

    YOUR_QUERY = """
    SELECT location,
       `Estimated Property Loss`,
       coordinates[0]  AS LAT,
       coordinates[1] AS LONG
     FROM (SELECT location,
            `Estimated Property Loss`,
             split(btrim(location, '() '), ', ') AS coordinates
           FROM default.sf_fire_incidents
           ORDER BY `Estimated Property Loss` DESC,
                    location)
     LIMIT 2000;
    """
    pdf = spark.sql(YOUR_QUERY).toPandas()
    

    And finally iterate over your rows and add your markers to the map object like such :

    for index, lines in pdf.iterrows():
     folium.Marker([lines[LAT], lines[LONG]], popup=f"{index}").add_to(m)
    

    Don't forget to show the map at end by just calling the object :

    m