Search code examples
pythonpandasgeopandasfoliumgeopy

Geopandas read geometry from geo_interface as JSON column in Dataframe


I have a GeoDataFrame with a geometry column (of polygons) and few other columns used for plotting polygons and their marker popups on the map. I exported this dataframe by using gdf.__geo_interface__ as a column geo and other attributes to a CSV file using to_csv on the complete DataFrame.

The geo column looks like

{'type': 'FeatureCollection', 'features': [{'id': '1', 'type': 'Feature', 'properties': {...}}

How can I read back from the CSV file and get the GeoDataFrame using the CSV? Specifically, how can I create back the original columns (polygons and attributes) that I had in the GeoDataFrame?


Solution

  • Given the following situation:

    from shapely.geometry import Point
    d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
    gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
    gdf
    

    you can define a function that will flatten any json by:

    def flatten_nested_json_df(df):
        df = df.reset_index()
        s = (df.applymap(type) == list).all()
        list_columns = s[s].index.tolist()
        
        s = (df.applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    
        
        while len(list_columns) > 0 or len(dict_columns) > 0:
            new_columns = []
    
            for col in dict_columns:
                horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
                horiz_exploded.index = df.index
                df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
                new_columns.extend(horiz_exploded.columns) # inplace
    
            for col in list_columns:
                #print(f"exploding: {col}")
                df = df.drop(columns=[col]).join(df[col].explode().to_frame())
                new_columns.append(col)
    
            s = (df[new_columns].applymap(type) == list).all()
            list_columns = s[s].index.tolist()
    
            s = (df[new_columns].applymap(type) == dict).all()
            dict_columns = s[s].index.tolist()
        return df
    
    

    Now, you used geo = gdf.__geo_interface__ which returned something like:

    {'type': 'FeatureCollection',
     'features': [{'id': '0',
       'type': 'Feature',
       'properties': {'col1': 'name1'},
       'geometry': {'type': 'Point', 'coordinates': (1.0, 2.0)},
       'bbox': (1.0, 2.0, 1.0, 2.0)},
      {'id': '1',
       'type': 'Feature',
       'properties': {'col1': 'name2'},
       'geometry': {'type': 'Point', 'coordinates': (2.0, 1.0)},
       'bbox': (2.0, 1.0, 2.0, 1.0)}],
     'bbox': (1.0, 1.0, 2.0, 2.0)}
    

    Note that I called it geo. Then, do this:

    json = json.dumps(geo) 
    df = pd.json_normalize(geo)
    flatten_nested_json_df(df)
    

    Which will give you:

    index               type                  bbox features.id features.type  \
    0      0  FeatureCollection  (1.0, 1.0, 2.0, 2.0)           0       Feature   
    0      0  FeatureCollection  (1.0, 1.0, 2.0, 2.0)           1       Feature   
    
              features.bbox features.properties.col1 features.geometry.type  \
    0  (1.0, 2.0, 1.0, 2.0)                    name1                  Point   
    0  (2.0, 1.0, 2.0, 1.0)                    name2                  Point   
    
      features.geometry.coordinates  
    0                    (1.0, 2.0)  
    0                    (2.0, 1.0)