I have an issue with geopandas and cannot come up with a smart solution myself...
I do have two geopandas dataframes. One contains point geometries (e.g. cities), the other polygon geometries (e.g. countries). Each point has a values (e.g. citizens), and I want to determine the total number of citizens within a polygon. They both have the same CRS.
Anyone out there who can provide me with a fast and generic way to code that in python?
I work with Python 3.7 and geopandas 0.7.0.
Many thanks in advance!
I think the best workflow you can have at that point is
groupby
your polygon id and agg('sum')
You can find this answer on how to use within to select observations that fall somewhere, e.g. in Europe
The syntax is the following:
geopandas.sjoin(points, polygons, how="inner", op='within')
Note: You need to have installed rtree
to be able to perform such operations. If you need to install this dependency, use pip
or conda
to install it
Let's take a merge between cities and countries and aggregate values by continent.
import geopandas
import numpy as np
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))
We create a dummy variable in cities
data:
cities['pop'] = np.random.randint(1, 6, cities.shape[0])
cities.head(3)
name geometry pop
0 Vatican City POINT (12.45339 41.90328) 4
1 San Marino POINT (12.44177 43.93610) 1
2 Vaduz POINT (9.51667 47.13372) 1
Spatial join is performed using the method given above:
data_merged = geopandas.sjoin(cities, world, how="inner", op='within')
data_merged.head(2)
name_left geometry pop index_right pop_est continent name_right iso_a3 gdp_md_est
0 Vatican City POINT (12.45339 41.90328) 4 141 62137802 Europe Italy ITA 2221000.0
1 San Marino POINT (12.44177 43.93610) 1 141 62137802 Europe Italy ITA 2221000.0
Note that you end up with a point
geopandas object if you want to plot, for instance.
Then, you can use standard pandas
syntax to compute statistics on your point data
data_merged.groupby('continent').agg('sum')
pop index_right pop_est gdp_md_est
continent
Africa 153 3430 1471168898 8227904.00
Asia 129 4761 4462705033 56109347.77
Europe 125 5392 1013640800 35541477.00
North America 47 587 569302584 23362898.00
Oceania 12 415 36220960 1401392.00
South America 23 494 426315904 6381910.00