Search code examples
pythongisgeopandas

Explanding GeoPandas Multipolygon Dataframe To One Poly Per Line


This question is similar to another one out there but none of the solutions worked for me. Note I have included several attempts at those solutions and results. If another library will achieve this I am open to it.

I am trying to expand a GeoJson file using GeoPandas where it contains multiple multi polygons.

Current geodataframe (3 Rows)

fill    fill-opacity    stroke  stroke-opacity  stroke-width    title   geometry
0   #9bf1e2 0.3 #9bf1e2 1   1   Hail Possible   (POLYGON ((-80.69500140880155 22.2885709067316...
1   #08c1e6 0.3 #08c1e6 1   1   Severe Hail (POLYGON ((-103.4850007575523 29.2010260633722...
2   #682aba 0.3 #682aba 1   1   Damaging Hail   (POLYGON ((-104.2750007349772 32.2629245180204...`

Desired geodataframe (200+ Rows)

fill    fill-opacity    stroke  stroke-opacity  stroke-width    title   geometry
0   #9bf1e2 0.3 #9bf1e2 1   1   Hail Possible   (POLYGON ((-80.69500140880155 22.2885709067316...
1   #9bf1e2 0.3 #9bf1e2 1   1   Hail Possible   (POLYGON ((-102.8150007766983 28.2180513479277...
2   #9bf1e2 0.3 #9bf1e2 1   1   Hail Possible   (POLYGON ((-103.4850007575523 29.0940821135748...
3   #9bf1e2 0.3 #9bf1e2 1   1   Hail Possible   (POLYGON ((-103.5650007552662 30.9947420843694...
4   #9bf1e2 0.3 #9bf1e2 1   1   Hail Possible   (POLYGON ((-103.6150007538374 31.0173836504729...

Sample File of geojson file being used: https://drive.google.com/file/d/1m6cMR4jF3QWp07e23sIdb0UF9xLD062s/view?usp=sharing

What I've Tried with no success:

df3.set_index(['title'])['geometry'].apply(pd.Series).stack().reset_index()

(Returns original unchanged gdf)

def cartesian(x): 
    return np.vstack(np.array([np.array(np.meshgrid(*i)).T.reshape(-1,7) for i in x.values]))
ndf = pd.DataFrame(cartesian(df3),columns=df3.columns)

(Returns original unchanged gdf)

import geopandas as gpd
from shapely.geometry.polygon import Polygon
from shapely.geometry.multipolygon import MultiPolygon

def explode(indata):
    indf = gpd.GeoDataFrame.from_file(indata)
    outdf = gpd.GeoDataFrame(columns=indf.columns)
    for idx, row in indf.iterrows():
        if type(row.geometry) == Polygon:
            outdf = outdf.append(row,ignore_index=True)
        if type(row.geometry) == MultiPolygon:
            multdf = gpd.GeoDataFrame(columns=indf.columns)
            recs = len(row.geometry)
            multdf = multdf.append([row]*recs,ignore_index=True)
            for geom in range(recs):
                multdf.loc[geom,'geometry'] = row.geometry[geom]
            outdf = outdf.append(multdf,ignore_index=True)
    return outdf

explode(GEOJSONFILE)

(Returns original unchanged gdf)

This is my first question on here so if any additional info or details are needed please let me know.

UPDATE: Found out the issue with the explode() function was due to a formatting issue on the file where the geometry was essentially a multi-polygon of multi-polygon causing a loop of only the first multi-polygon. The explode function works.


Solution

  • You can use Geopandas explode().

    exploded = original_df.explode()
    

    copying from docstring:

        Explode muti-part geometries into multiple single geometries.
    
        Each row containing a multi-part geometry will be split into
        multiple rows with single geometries, thereby increasing the vertical
        size of the GeoDataFrame.
        The index of the input geodataframe is no longer unique and is
        replaced with a multi-index (original index with additional level
        indicating the multiple geometries: a new zero-based index for each
        single part geometry per multi-part geometry).
    
        Returns
        -------
        GeoDataFrame
            Exploded geodataframe with each single geometry
            as a separate entry in the geodataframe.