In this geopandas
example, the Antarctica land mass is dropped from the GeoDataFrame
before reprojecting to the Mercator projection, in order to prevent problem with shapes containing a pole (which would become infinitely large).
I was wondering, if it is possible to find a more robust reprojection method, so that the dataframe does not need to be manually adjusted. Especially since I'm working with a dataset that does not have a separate row for Antarctica:
I have two ideas:
crs
On the epsg website for example, the 'Area of use' is shown:
We could use it to prepare the data before the reprojection: drop any shapes that extend further South than -80 deg, or alternatively intersect it with a shapely Polygon
that describes the area of use of the destination crs, in terms of the source crs - in this case the standard epsg:4326
so Polygon([(-180,-80), (-180,84), ...])
.
Issue with this approach: I'm not sure, if this area of use information is programmatically accessible from somewhere for any crs
, e.g. from the GeoDataFrame
object.
Just do it, and pick out the erroneously reprojcted parts later. In my current case, for example, the reprojected geodataframe gdf_merc = gdf.to_crs(epsg=3395)
does have errors...
...but by searching for the inf
word in the string representation of the geometry, I can find the offending Polygon
within the MultiPolygon
...
In [360]: for i, polygon in enumerate(gdf_merc.geometry[0]):
...: if 'inf' in str(polygon):
...: print(i)
0
...and just remove it:
Issue with this approach: seems elaborate and I'd prefer to prevent any problems from appearing in the first place.
Any thoughts on how to fix either of these methods, or is there a third way?
One remark: I'm interested in the general case, where any crs could be reprojected to, so I don't want to preemptively remove Antarctica ("just in case"), as other projections might be perfectly fine with it, and, more importantly, they might have other problem areas.
Many thanks!
The option 1 is likely the best shot in here. Latest GeoPandas uses pyproj.CRS to store CRS data, from which you can easily extract bounds of the projection.
To extract it from df:
import geopandas as gpd
df = gpd.read_file(gpd.datasets.get_path('nybb'))
df.crs.area_of_use.bounds
To get it from target CRS using pyproj directly:
import pyproj
crs = pyproj.CRS.from_epsg(3395)
crs.area_of_use.bounds
Then you can use built-in geopandas.clip
to clip your data.
from shapely.geometry import box
df = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
crs = pyproj.CRS.from_epsg(3395)
bounds = crs.area_of_use.bounds
clipped = gpd.clip(df, box(*bounds))
clipped.to_crs(crs).plot()