Search code examples
pythongisarcgisshapelysimplify

Simplify Polygon shapefile to reduce file size in python


I have a polygon shapefile which was vectorized from a raster layer (second image). This shapefile contains thousands of features with just one column which represent polygon level (ranging 1-5). The file size is massive and so I was trying to use Shapely's simplify tool to reduce the file size. The aim is to get to a similar result to ArcGIS pro's simplify polygon tool, which looks great and reduced the file size by 80% (third image).

So far the Shapely tool had produced a pretty similar result but it's still not perfect because bordering polygons do not align great smoothly (see first image).

I assume I need to use some sort of smoothing or interpolation function to sort it, but I can't seem to find one that works.

the code I have so far:

from shapely.geometry import shape, mapping
from shapely.ops import transform
from shapely.validation import make_valid
import fiona
import pyproj

# Define the projections
project_to_utm = pyproj.Transformer.from_crs("EPSG:4326", "EPSG:32633", always_xy=True).transform
project_to_wgs84 = pyproj.Transformer.from_crs("EPSG:32633", "EPSG:4326", always_xy=True).transform

# Open your shapefile
with fiona.open(r"shapefile.shp", 'r') as source:
    schema = source.schema
    crs = source.crs
    features = list(source)

    # Reproject to UTM, simplify, then reproject back to WGS84
    simplified_features = []
    for feature in features:
        # Reproject to UTM
        geom = shape(feature['geometry'])
        geom_utm = transform(project_to_utm, geom)

        # Simplify in UTM
        simplified_geom_utm = geom_utm.simplify(tolerance=0.5, preserve_topology=True)

        # Fix any invalid geometries (self-intersections)
        if not simplified_geom_utm.is_valid:
            simplified_geom_utm = make_valid(simplified_geom_utm)

        # Reproject back to WGS84
        simplified_geom_wgs84 = transform(project_to_wgs84, simplified_geom_utm)

        # Append the simplified and validated geometry
        simplified_features.append({
            'geometry': mapping(simplified_geom_wgs84),
            'properties': feature['properties']
        })

# Save the simplified polygons to a new shapefile
with fiona.open(r"shapfile_simplified.shp", 'w', driver='ESRI Shapefile', schema=schema, crs=crs) as output:
    for feature in simplified_features:
        output.write(feature)

Output vector from python

Original Vector after vectorizing a raster

Output vector from ArcGIS pro (simplify polygon tool)


Solution

  • The problem you are facing is that shapely at the moment only can simplify geometries one by one. Because of this, gaps and slivers can/will appear between adjacent polygons because different points might be removed on the adjacent borders of the polygons.

    To avoid this, you need "topology-aware" simplification. This typically means that first the common boundaries are determined. Then these common boundary lines are simplified. Finally the polygons are reconstructed. This way the common boundaries will stay common without gaps and slivers.

    A library that supports this way of simplifying is topojson, via the toposimplify function.

    Because toposimplify requires all geometries to be in memory anyway, I used geopandas to load and save the geometries in the sample script as this will be easier and more efficient.

    Sample code:

    import geopandas as gpd
    import topojson
    
    input_path = r"shapefile.shp"
    output_path = r"shapefile_simplified.shp"
    
    # Read the shapefile
    input_gdf = gpd.read_file(input_path)
    
    # Reproject to UTM
    input_gdf = input_gdf.to_crs(32633)
    
    # Convert to topology, simplify and convert back to GeoDataFrame
    topo = topojson.Topology(input_gdf, prequantize=False)
    topo_simpl = topo.toposimplify(0.5)
    simpl_gdf = topo_simpl.to_gdf()
    
    # Fix any invalid geometries (self-intersections)
    simpl_gdf.geometry = simpl_gdf.geometry.make_valid()
    
    # Reproject back to WGS84
    simpl_gdf = simpl_gdf.to_crs(4326)
    
    # Write to output file
    simpl_gdf.to_file(output_path)