Search code examples
pythongisgeopandasosmnx

Compute areas for multiple entries of GeoDataFrame


I have some working code that computes the area of a city by name:

def get_area_of_city(city_name):
    # Fetch the geodataframe for the specified city
    city_gdf = ox.geocoder.geocode_to_gdf(city_name, which_result=1)

    # Access the geometry (polygon) of the city from the geodataframe
    city_polygon = city_gdf['geometry']

    # Get the latitude and longitude of the city (centroid of the polygon)
    city_latitude, city_longitude = city_polygon.geometry.centroid.y, city_polygon.geometry.centroid.x

    # Define the Cylindrical Equal Area (CEA) projection centered at the city
    cea_projection = f"+proj=cea +lon_0={city_longitude} +lat_ts={city_latitude} +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs"

    # Reproject the polygon to the CEA projection
    city_polygon_cea = city_polygon.to_crs(cea_projection)

    # Compute the area of the polygon in square meters
    area_square_meters = city_polygon_cea.area

    # You can also convert the area to square kilometers if needed
    area_square_kilometers = area_square_meters / 1000000.0
    
    return area_square_kilometers

Now, I want to adapt this code such that it works with any GeoDataFrame that contains multiple cities. This code should be able to construct a projection for each city and apply it to the polygon to get the area. How can I do this? I currently have the following code:


def get_area_of_geodataframe(gdf):
    # Get a copy of the original GeoDataFrame
    gdf_copy = gdf.copy()

    # Get the latitude and longitude of the centroid of all geometries in the GeoDataFrame
    gdf_copy['city_latitude'] = gdf_copy.geometry.centroid.y
    gdf_copy['city_longitude'] = gdf_copy.geometry.centroid.x

    # Define the Cylindrical Equal Area (CEA) projection for each geometry
    gdf_copy['cea_projection'] = gdf_copy.apply(lambda row: f"+proj=cea +lon_0={row['city_longitude']} +lat_ts={row['city_latitude']} +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs", axis=1)

    # Reproject each geometry to the CEA projection
    gdf_copy['city_polygon_cea'] = gdf_copy.apply(lambda row: row.to_crs(row['cea_projection']), axis=1)

    # Compute the area of each geometry in square meters
    gdf_copy['area_square_meters'] = gdf_copy['city_polygon_cea'].area

    # Convert the area to square kilometers
    gdf_copy['area_square_kilometers'] = gdf_copy['area_square_meters'] / 1000000.0

    # Drop the intermediate columns and return the modified GeoDataFrame
    gdf_copy = gdf_copy.drop(columns=['city_latitude', 'city_longitude', 'cea_projection', 'city_polygon_cea'])
    return gdf_copy

However, the error I revieve is AttributeError: 'Series' object has no attribute 'to_crs' when I call row.to_crs().

How do I have to adapt my code?

I tried to use the above code, but I get an error.


Solution

  • Why is it not working?

    Surprisingly or not, when you take a single row of a geopandas.GeoDataFrame, it ends up being a pandas.Series, thus it does not have the method to_crs().

    type(gdf_copy.iloc[0])
    
    # returns <class 'pandas.core.series.Series'>
    

    There seems to be a debate here on whether or not getting back a pandas.Series should be the expected behavior. This answer provides an interesting indexing trick (notice the double square-brackets):

    type(gdf_copy.iloc[[0]])
    
    # returns <class 'geopandas.geodataframe.GeoDataFrame'>
    

    I don't think we can make use of that trick here though, as apply() let us work with what is already a pandas.Series.

    Should you really want to go that way, you could probably accomplish that by working directly with shapely and pyproj.

    Why might it not be a good idea anyway?

    Had it worked the way you were trying to, you would have ended up with shapely objects with different CRS in a single geometry column, which geopandas does not support. You can have different geometry columns with different CRS though, despite the fact that only one is considered as the main geometry of the GeoDataFrame.

    Suggested way to go...

    Why not reuse your nice and working get_area_of_city(city_name) function?

    df = pd.DataFrame({'city': ['Lyon', 'Paris', 'Marseille']})
    df['area_square_kilometers'] = df.apply(lambda row: get_area_of_city(row['city']), axis=1)
    
    #        city  area_square_kilometers
    # 0       Lyon               47.985400
    # 1      Paris              105.390441
    # 2  Marseille              242.129201
    

    Hope this helps!