Search code examples
pythonpandasdataframegeopandas

Combining rows in a Geopandas Dataframe


TLDR: I'm trying to combine rows of a GeoPandas Dataframe into one row where their shapes are combined into one.

I'm currently working on a little project that requires me to create interactive choropleth plots of Canadian health regions using a few different metrics.

I had merged two Dataframes, one containing population estimates by year for each health region, and another GeoDataframe containing the geometry for the health regions, when I noticed that the number of rows wasn't the same.

Upon further inspection, I realized the two datasets I had been using didn't include the exact same health regions. The shape-files I got had a few more health regions than the population data, which had amalgamated a few of them for methodological reasons.

After noticing the difference, I redid the merge to show me the differences so I could figure out what I need to roll up.

merged_gdf = gdf.merge(df, on='HR_UID') 
#HR_UID is just the name of the column with the codes for the health regions, since they   
#have slightly different names in different datasets, it's easier to merge on code.
print(list(set(df['HEALTH_REGION'])-set(merged_gdf['HEALTH_REGION_y'])),list(set(gdf['HR_UID'])-set(df['HR_UID'].unique())))

Here I was shown the missing health region was ['Mamawetan/Keewatin/Athabasca, Saskatchewan']. The GeoDataframe has those three regions separate, with codes 4711, 4712, 4713, while the population data has them rolled up into one region with code 4714.

I intend on combining the rows of my GeoDataframe that correspond to the health regions combined in the population data, to combine their polygons. I went back to the GeoDataframe to try and combine the three rows corresponding to those regions:

old_hr=gdf[gdf['HR_UID'].isin({'4711','4712','4713'})]
  HR_UID                                      HEALTH_REGION    SHAPE_AREA  \
31   4711  Mamawetan Churchill River Regional Health Auth...  1.282120e+11   
32   4712          Keewatin Yatthé Regional Health Authority  1.095536e+11   
33   4713                         Athabasca Health Authority  5.657720e+10   

       SHAPE_LEN                                           geometry  
31  1.707619e+06  POLYGON ((5602074.666 2364598.029, 5591985.366...  
32  1.616297e+06  POLYGON ((5212469.723 2642030.691, 5273110.000...  
33  1.142962e+06  POLYGON ((5248633.914 2767057.263, 5249285.640...  

Now I've come to the realization that I'm not exactly sure how to combine polygons in a GeoDataframe. I have tried using dissolve(on='HEALTH_REGION'), although that didn't work. I've spent a while looking around online, but thus far it seems I can't find anyone asking this particular question - perhaps I'm missing something..


Solution

  • Turns out it was actually simpler than I had imagined, and I was just confused about some additional columns in the dataframe that weren't actually necessary for the mapping. I'm new to Geopandas and mapping in general, so I hadn't realized the SHAPE_AREA and SHAPE_LEN weren't actually needed.

    Here was the code I used to import the dataframe without the extra columns and then combine the 3 polygons:

    # if this is not "pythonic" let me know, I'm still a python rookie, but this  
    # worked for me. 
    
    gdf = gpd.read_file('data/HR_Boundary_Files/HR_000b18a_e.shp', encoding='utf-8').drop(columns={'FRENAME', 'SHAPE_AREA','SHAPE_LEN'})
    gdf.rename(columns={'ENGNAME':'HEALTH_REGION'}, inplace=True)
    old_hr=gdf[gdf['HR_UID'].isin({'4711','4712','4713'})]
    gdf=gdf[~gdf['HR_UID'].isin({'4711','4712','4713'})]
    new_region_geometry = old_hr['geometry'].unary_union
    gdf=gdf.append(pd.Series(['4714', 'Mamawetan/Keewatin/Athabasca Health Region', new_region_geometry], 
                             index=gdf.columns), ignore_index=True)
    

    The unary_union property of GeoSeries returns the union of all the geometries, which gave me the new shape I needed. I just added that into the dataframe with the correct region name and code, and dropped the old regions that made up the new one.