I am trying to aggregate data for road segments to a fixed grid using geopandas.
The road data looks like this:
|Variable | geometry |
|---------|----------|
| 59 |LINESTRING (440735.351 4319767.843, 440733.320...|
| 48 |LINESTRING (440733.320 4319607.463, 440732.508...|
| 64 |LINESTRING (440858.641 4329887.089, 440853.551...|
| 64 |LINESTRING (440920.578 4330030.844, 440890.661...|
| 68 |LINESTRING (218573.705 4257347.137, 218586.697...|
The fixed grid I want to aggregate the road data appears as:
|geometry|
|--------|
| POLYGON ((238749.978 4498509.611, 238898.825 4..|
| POLYGON ((240086.217 4498360.636, 240234.824 4... |
| POLYGON ((241421.845 4498211.925, 241570.857 4..|
| POLYGON ((242758.152 4498063.434, 242906.923 4... |
| POLYGON ((244094.479 4497914.762, 244243.009 4... |
I am currently trying to weight the variable by the road length, apply a spatial join, multiplying the weighted by the road length within the grid cell and summing over the grid for all roads using this code:
gdf_road['weighted_variable'] = gdf_road['Variable']/ gdf_road.geometry.legnth
gdf_joined = gpd.sjoin(gdf_road, gdf_grid, how="inner", op='intersects')
gdf_grid['agg_variable']=(gdf_joined['Weighted_Variable'] * gdf_joined.geometry.length).groupby(gdf_joined['index_right']).sum()
The result of this code is data that maintains the spatial structure of the original road data, but the magnitude of the variable seems erroneously high. I am curious if there is something I may have overlooked in my logic of aggregating road data to a grid -- perhaps in double counting overlapping roads, not applying another type of weighting scheme...etc. Any suggestions would be great. Thank you!
gdf.overlay seemed to do the trick
gdf_road['weighted_variable'] = gdf_road['Variable']/ gdf_road.geometry.legnth
gdf_join = gpd.overlay(gdf_road, gdf_grid, how="intersection")
gdf_join['new_length'] = gdf_join.to_crs('EPSG:26916').geometry.length
gdf_join['new_variable'] = gdf_join['weighted_variable'] * gdf_join['new_length']
# Spatial join & aggregate data
joined_data = gpd.sjoin(gdf_join, gdf_grid, how="inner", op="within")
grouped_data = joined_data.groupby('index_right')
agg_data = grouped_data[['new_variable']].sum().reset_index()