Search code examples
dataframegeopandascoordinate-systems

Cannot find the correct crs mapping to my geodataframe for area calculation


I'm trying to calculate the area of my GeoDataFrame with geopandas.area in squremeters, but the calculated area is very unreasonably small, in the magnitude of e-6.

I've provided sample data and my code. My site is in San Francisco, so I set the crs to be 3857. Then I tried to convert crs to the utm zone based on the answer: https://gis.stackexchange.com/questions/429601/why-are-my-area-calculations-in-python-so-small-with-area who has similar problem as mine, but it still renders small area.

I wonder if there's any other way to find the correct crs for area calculation.

import pandas as pd
import geopandas as gpd
import utm #pip install utm
from pyproj import CRS
from shapely.geometry import Polygon

def footprint_to_polygon(footprint_str):
    points = [tuple(list(map(float, point.split(',')))[::-1]) for point in footprint_str.split()]
    return Polygon(points)
    
def findtheutm(aGeometry):
#A function to find a coordinates UTM zone"""
    x, y, parallell, latband = utm.from_latlon(aGeometry.centroid.y, aGeometry.centroid.x)
    if latband in 'CDEFGHJKLM': #https://www.lantmateriet.se/contentassets/379fe00e09d74fa68550f4154350b047/utm-zoner.gif
        ns = 'S'
    else:
        ns = 'N'
    crs = "+proj=utm +zone={0} +{1}".format(parallell, ns) #https://gis.stackexchange.com/questions/365584/convert-utm-zone-into-epsg-code
    crs = CRS.from_string(crs)
    _, code = crs.to_authority()
    return int(code)
    
data = {
    'FootprintPointsStr': [
        "37.777289,-122.403477 37.777351,-122.403556 37.777433,-122.403671 37.777382,-122.403745",
        "37.776745,-122.40807 37.776437,-122.408476 37.776313,-122.408355 37.776636,-122.40806",
        "37.777837,-122.407172 37.777643,-122.406931 37.777532,-122.407089 37.777748,-122.407287",
        "37.776093,-122.408003 37.77624,-122.407812 37.776171,-122.407729 37.776017,-122.407935",
        "37.774312,-122.412135 37.77462,-122.41251 37.774707,-122.41242 37.774378,-122.41205"
    ]
}

# Create a DataFrame
df = pd.DataFrame(data)
df['geometry'] = df['FootprintPointsStr'].apply(footprint_to_polygon)
gdf = gpd.GeoDataFrame(df, geometry='geometry')
gdf.set_crs(epsg=3857, inplace=True)
gdf['area'] = gdf['geometry'].area
gdf['area2'] = gdf.to_crs(epsg=findtheutm(gdf.geometry.iloc[0])).area
gdf

Solution

  • "My site is in San Francisco, so I set the crs to be 3857."

    Well, your assumption is the whole problem. You're dealing with lat/lon coordinates (i.e, EPSG:4326) and not projected ones (i.e, EPSG:3857).

    gdf = gpd.GeoDataFrame(df, crs="EPSG:4326")
    
    gdf["area (utm)"] = gdf.to_crs(epsg=findtheutm(gdf.geometry.iloc[0])).area
    gdf["area (gpd)"] = gdf.to_crs(gdf.estimate_utm_crs()).area
    

    NB: geopandas has a useful estimate_utm_crs with a default WGS 84 datum.

    Output (gdf) :

                  FootprintPointsStr                       geometry  area (utm)  area (gpd)
    0  37.777289,-122.403477 37.7...  POLYGON ((-122.40348 37.77...  103.581864  103.581864
    1  37.776745,-122.40807 37.77...  POLYGON ((-122.40807 37.77...  600.906279  600.906279
    2  37.777837,-122.407172 37.7...  POLYGON ((-122.40717 37.77...  487.886274  487.886274
    3  37.776093,-122.408003 37.7...  POLYGON ((-122.408 37.7760...  251.645312  251.645312
    4  37.774312,-122.412135 37.7...  POLYGON ((-122.41214 37.77...  550.760413  550.760413