Search code examples
pythongeospatialgeopandasbinning

Spatial binning from a spatial dataframe using geopandas (Python)


I want to do a spatial binning (using median as aggregation function) starting from a CSV file containing pollutant values measured at positions long and lat.
The resulting map should be something as:

enter image description here

But for data applied to a city's extent. At this regard I found this tutorial that is close to what I want to do, but I was not able to get the desired result. I think that I'm missing something on how to correctly use dissolve and plot the resulting data (better using Folium) Any useful example code?


Solution

    • you have not provided sample data. So I have used global earthquakes as set of points and geometry of California for scope / extent
    • it's simple to create grid using shapely.geometry.box()
    • I have shown use of median and also another aggfunc to demonstrate multiple metrics can be calculated
    • have used folium to plot. This feature is new in geopandas 0.10.0 https://geopandas.org/en/stable/docs/user_guide/interactive_mapping.html
    import geopandas as gpd
    import shapely.geometry
    import numpy as np
    
    # equivalent of CSV, all earthquake points globally
    gdf_e = gpd.read_file(
        "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.geojson"
    )
    
    # get geometry of bounding area.  Have selected a state rather than a city
    gdf_CA = gpd.read_file(
        "https://raw.githubusercontent.com/glynnbird/usstatesgeojson/master/california.geojson"
    ).loc[:, ["geometry"]]
    
    BOXES = 50
    a, b, c, d = gdf_CA.total_bounds
    
    # create a grid for Califormia, could be a city
    gdf_grid = gpd.GeoDataFrame(
        geometry=[
            shapely.geometry.box(minx, miny, maxx, maxy)
            for minx, maxx in zip(np.linspace(a, c, BOXES), np.linspace(a, c, BOXES)[1:])
            for miny, maxy in zip(np.linspace(b, d, BOXES), np.linspace(b, d, BOXES)[1:])
        ],
        crs="epsg:4326",
    )
    
    # remove grid boxes created outside actual geometry
    gdf_grid = gdf_grid.sjoin(gdf_CA).drop(columns="index_right")
    
    # get earthquakes that have occured within one of the grid geometries
    gdf_e_CA = gdf_e.loc[:, ["geometry", "mag"]].sjoin(gdf_grid)
    # get median magnitude of eargquakes in grid
    gdf_grid = gdf_grid.join(
        gdf_e_CA.dissolve(by="index_right", aggfunc="median").drop(columns="geometry")
    )
    # how many earthquakes in the grid
    gdf_grid = gdf_grid.join(
        gdf_e_CA.dissolve(by="index_right", aggfunc=lambda d: len(d))
        .drop(columns="geometry")
        .rename(columns={"mag": "number"})
    )
    
    # drop grids geometries that have no measures and create folium map
    m = gdf_grid.dropna().explore(column="mag")
    # for good measure - boundary on map too
    gdf_CA["geometry"].apply(lambda g: shapely.geometry.MultiLineString([p.exterior for p in g.geoms])).explore(m=m)
    

    enter image description here