Search code examples
pythonpandasdataframegeojsongeopandas

Extracting a geojson file/object embedded in pandas dataframe


I have a pandas dataframe that is a more complex version of this simplified one:

# Test data frame
data = {'Geojson': ['{"geometry": {"coordinates": [[[24.950899, 60.169158], [24.953492, 60.169158],[24.953510, 60.170104],[24.950958, 60.169990]]],"type": "Polygon"},"id": 1,"properties": {"GlobalID": "84756blabla","NAME": "Helsinki Senate Square","OBJECTID": 1,"OBS_CREATEDATE": 1641916981000,"OBS_UPDATEDATE": null, "Area_m2": 6861.47},"type": "Feature"}'],'Name': ["Helsinki Senate Square"], 'Type': ["Polygon"]}

df = pd.DataFrame(data)

df.head()
...
    Geojson Name    Type
0   {"geometry": {"coordinates": [[[24.950899, 60....   Helsinki Senate Square  Polygon

As you can see, there is a GeoJSON file embedded in the first column. What I would like to do is extract the GeoJSON value from that column and save it separately as a GeoJSON file, but I've been having trouble doing this. Finding help on the net is not easy as most show examples for how to extract a JSON, which has different properties from those of a GeoJSON.

If possible I'd also like to extract the GeoJSON as a geopandas GeoDataFrame within the same python script.

As you may have guessed, the end goal is to be able to map the data or use it in a GIS context. Since there are many GeoJSONs in the column (not just one as per my example). The solution may require iteration. The datatype is polygon, but I'd also be interested in a solution which could take into account different feature types, eg. point, multiline, multipolygon etc ...

Any suggestions/solutions would be most welcome.


Solution

  • GeoJSON is a json format. I'd parse each feature and add it to a FeatureCollection.

    Here's an example using your test data:

    import json
    import geopandas as gpd
    
    # list of features: just the one feature in question here
    test_features = ['{"geometry": {"coordinates": [[[24.950899, 60.169158], [24.953492, 60.169158],[24.953510, 60.170104],[24.950958, 60.169990]]],"type": "Polygon"},"id": 1,"properties": {"GlobalID": "84756blabla","NAME": "Helsinki Senate Square","OBJECTID": 1,"OBS_CREATEDATE": 1641916981000,"OBS_UPDATEDATE": null, "Area_m2": 6861.47},"type": "Feature"}']
    
    new_feature_collection = {
        'type': 'FeatureCollection',
        'features': []
    }
    
    for feature in test_features:
        feature = json.loads(feature)
        new_feature_collection['features'].append(feature)
    
    # convert to GeoJSON-formatted string
    geojson_out = json.dumps(new_feature_collection, indent=4)
    
    # show it
    print(geojson_out)
    
    
    # Alternatively, if you're just interested in getting a GeoDataFrame:
    gdf = gpd.GeoDataFrame.from_features([json.loads(feature) for feature in test_features])
    print(gdf)
    
    

    Output:

    {
        "type": "FeatureCollection",
        "features": [
            {
                "geometry": {
                    "coordinates": [
                        [
                            [
                                24.950899,
                                60.169158
                            ],
                            [
                                24.953492,
                                60.169158
                            ],
                            [
                                24.95351,
                                60.170104
                            ],
                            [
                                24.950958,
                                60.16999
                            ]
                        ]
                    ],
                    "type": "Polygon"
                },
                "id": 1,
                "properties": {
                    "GlobalID": "84756blabla",
                    "NAME": "Helsinki Senate Square",
                    "OBJECTID": 1,
                    "OBS_CREATEDATE": 1641916981000,
                    "OBS_UPDATEDATE": null,
                    "Area_m2": 6861.47
                },
                "type": "Feature"
            }
        ]
    }
    
    
    geometry GlobalID NAME OBJECTID OBS_CREATEDATE OBS_UPDATEDATE Area_m2
    0 POLYGON ((24.95090 60.16916, ... 84756blabla Helsinki Senate Square 1 1641916981000 None 6861.47