Search code examples
pythonstatisticspolygonspatialgeopandas

Python geopandas - how to aggregate (or do some other statistics) values of points within a polygon?


I have an issue with geopandas and cannot come up with a smart solution myself...

I do have two geopandas dataframes. One contains point geometries (e.g. cities), the other polygon geometries (e.g. countries). Each point has a values (e.g. citizens), and I want to determine the total number of citizens within a polygon. They both have the same CRS.

Anyone out there who can provide me with a fast and generic way to code that in python?

I work with Python 3.7 and geopandas 0.7.0.

Many thanks in advance!


Solution

  • I think the best workflow you can have at that point is

    1. Spatial join to match your points with the polygon they belong to
    2. groupby your polygon id and agg('sum')

    You can find this answer on how to use within to select observations that fall somewhere, e.g. in Europe

    The syntax is the following:

    geopandas.sjoin(points, polygons, how="inner", op='within')
    

    Note: You need to have installed rtree to be able to perform such operations. If you need to install this dependency, use pip or conda to install it

    Example

    Let's take a merge between cities and countries and aggregate values by continent.

    import geopandas
    import numpy as np
    
    world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
    cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
    

    We create a dummy variable in cities data:

    cities['pop'] = np.random.randint(1, 6, cities.shape[0])
    
    cities.head(3)
        name            geometry    pop
    0   Vatican City    POINT (12.45339 41.90328)   4
    1   San Marino      POINT (12.44177 43.93610)   1
    2   Vaduz           POINT (9.51667 47.13372)    1
    

    Spatial join is performed using the method given above:

    data_merged = geopandas.sjoin(cities, world, how="inner", op='within')
    
    data_merged.head(2)
        name_left   geometry    pop     index_right     pop_est     continent   name_right  iso_a3  gdp_md_est
    0   Vatican City    POINT (12.45339 41.90328)   4   141     62137802    Europe  Italy   ITA     2221000.0
    1   San Marino  POINT (12.44177 43.93610)   1   141     62137802    Europe  Italy   ITA     2221000.0
    

    Note that you end up with a point geopandas object if you want to plot, for instance.

    Then, you can use standard pandas syntax to compute statistics on your point data

    data_merged.groupby('continent').agg('sum')
    
        pop     index_right     pop_est     gdp_md_est
    continent               
    Africa  153     3430    1471168898  8227904.00
    Asia    129     4761    4462705033  56109347.77
    Europe  125     5392    1013640800  35541477.00
    North America   47  587     569302584   23362898.00
    Oceania     12  415     36220960    1401392.00
    South America   23  494     426315904   6381910.00