Search code examples
geopandassmoothingmoving-averagerolling-average

How to make moving average using geopandas nearest neighbors?


I have a geodataframe ("GDF") with one column as "values", and another column as "geometry" (that in fact are actual geographical regions), so each row represents a region.

The "values" column is zero in many rows, and a large number in some rows.

I need to make a "moving average" or rolling average, using the nearest neighbors up to a certain "max_distance" (we can assume that the GDF has a locally projected CRS, so the max_distance has real meaning). Thus, the averaged_values would have neither zero or large values in most of the regions, but an average value.

One way to do it would be

for region in GDF:
    averaged_values=sjoin_nearest(GDF,GDF,maxdistance=1000).values.mean()

But really I don't know how to proceed.

The expected output would be a geodataframe with 3 columns: "values", "averaged_values", and "geometry".

Any ideas?


Solution

  • What you are trying to do is also called a spatial lag. The best way is to create spatial weights matrix based on a set distance and compute the lag, both using libpysal library, which is a part of the geopandas ecosystem.

    import libpysal
    
    # create weights
    W = libpysal.weights.DistanceBand.from_dataframe(gdf, threshold=1000)
    
    # row-normalise weights
    W.transform = "r"
    
    # create lag
    gdf["averaged_values"] = libpysal.weights.lag_spatial(W, gdf["values"])