I have a geodataframe ("GDF") with one column as "values", and another column as "geometry" (that in fact are actual geographical regions), so each row represents a region.
The "values" column is zero in many rows, and a large number in some rows.
I need to make a "moving average" or rolling average, using the nearest neighbors up to a certain "max_distance" (we can assume that the GDF has a locally projected CRS, so the max_distance has real meaning). Thus, the averaged_values would have neither zero or large values in most of the regions, but an average value.
One way to do it would be
for region in GDF:
averaged_values=sjoin_nearest(GDF,GDF,maxdistance=1000).values.mean()
But really I don't know how to proceed.
The expected output would be a geodataframe with 3 columns: "values", "averaged_values", and "geometry".
Any ideas?
What you are trying to do is also called a spatial lag. The best way is to create spatial weights matrix based on a set distance and compute the lag, both using libpysal
library, which is a part of the geopandas ecosystem.
import libpysal
# create weights
W = libpysal.weights.DistanceBand.from_dataframe(gdf, threshold=1000)
# row-normalise weights
W.transform = "r"
# create lag
gdf["averaged_values"] = libpysal.weights.lag_spatial(W, gdf["values"])