Search code examples
pythongeopandaspyproj

Geopandas: buffer operation seems to ignore the unit of measure of the CRS


My goal here is to make a geodataframe from a couple of columns of coordinates in an existing dataframe, take those 1677 geographic points and add a buffer circle around each, then union the resulting polygons into a multipolygon. Where I keep getting wrapped around the axle is the .buffer() part of geopandas doesn't seem to be using the units of measure for the CRS I've selected.

In  []: ven_coords

Out []:     VenLat      VenLon
       0    42.34768    -71.085359
       1    42.349014   -71.081096
       2    42.347627   -71.081685
       3    42.348718   -71.077984
       4    42.34896    -71.081467
     ...         ...           ...
    1672    42.308962   -71.073516
    1673    42.313169   -71.089027
    1674    42.309717   -71.08247
    1675    42.356336   -71.074386
    1676    42.313005   -71.089887
    1677 rows × 2 columns

In  []: ven_coords_gdf = geopandas.GeoDataFrame(ven_coords, 
                                        geometry=geopandas.points_from_xy(ven_coords.VenLon, ven_coords.VenLat))
        ven_coords_gdf

Out []: VenLat  VenLon  geometry
       0    42.34768    -71.085359  POINT (-71.08536 42.34768)
       1    42.349014   -71.081096  POINT (-71.08110 42.34901)
       2    42.347627   -71.081685  POINT (-71.08168 42.34763)
       3    42.348718   -71.077984  POINT (-71.07798 42.34872)
       4    42.34896    -71.081467  POINT (-71.08147 42.34896)
     ...         ...           ...                        ...
    1672    42.308962   -71.073516  POINT (-71.07352 42.30896)
    1673    42.313169   -71.089027  POINT (-71.08903 42.31317)
    1674    42.309717   -71.08247   POINT (-71.08247 42.30972)
    1675    42.356336   -71.074386  POINT (-71.07439 42.35634)
    1676    42.313005   -71.089887  POINT (-71.08989 42.31300)
    1677 rows × 3 columns

So far so good, let's see what sort of thing I got back:

In  []: print('Type:', type(ven_coords_gdf), "/ current CRS is:",ven_coords_gdf.crs)

Out []: Type: <class 'geopandas.geodataframe.GeoDataFrame'> / current CRS is: None

It has no CRS, so I assign it the one relevant to what I'm working on:

In  []: ven_coords_gdf.crs = ("epsg:2249")
        print('Type:', type(ven_coords_gdf), "/ current CRS is:",ven_coords_gdf.crs)

Out []: Type: <class 'geopandas.geodataframe.GeoDataFrame'> / current CRS is: epsg:2249

It appears to have "taken" the CRS I added, and just to double-check, let's take a look at the details for the CRS in question:

In  []: CRS.from_epsg(2249)

Out []: <Projected CRS: EPSG:2249>
        Name: NAD83 / Massachusetts Mainland (ftUS)
        Axis Info [cartesian]:
        - X[east]: Easting (US survey foot)
        - Y[north]: Northing (US survey foot)
        Area of Use:
        - name: United States (USA) - Massachusetts onshore - counties of Barnstable; Berkshire; Bristol; Essex; Franklin; Hampden; Hampshire; Middlesex; Norfolk; Plymouth; Suffolk; Worcester.
        - bounds: (-73.5, 41.46, -69.86, 42.89)
        Coordinate Operation:
        - name: SPCS83 Massachusetts Mainland zone (US Survey feet)
        - method: Lambert Conic Conformal (2SP)
        Datum: North American Datum 1983
        - Ellipsoid: GRS 1980
        - Prime Meridian: Greenwich

2249 uses the U.S. Survey Foot as it's unit of measure, so I'll set my buffer at 1000 to get a 1000 foot radius from each of the points in my data:

In  []: ven_coords_buffer = ven_coords_gdf.geometry.buffer(distance = 1000)
        ven_coords_buffer

Out []: 0       POLYGON ((928.915 42.348, 924.099 -55.669, 909...
        1       POLYGON ((928.919 42.349, 924.104 -55.668, 909...
        2       POLYGON ((928.918 42.348, 924.103 -55.670, 909...
        3       POLYGON ((928.922 42.349, 924.107 -55.668, 909...
        4       POLYGON ((928.919 42.349, 924.103 -55.668, 909...
                                     ...                        
        1672    POLYGON ((928.926 42.309, 924.111 -55.708, 909...
        1673    POLYGON ((928.911 42.313, 924.096 -55.704, 909...
        1674    POLYGON ((928.918 42.310, 924.102 -55.707, 909...
        1675    POLYGON ((928.926 42.356, 924.110 -55.661, 909...
        1676    POLYGON ((928.910 42.313, 924.095 -55.704, 909...
        Length: 1677, dtype: geometry

Those coordinates are just a wee bit off. Clearly the buffer applied itself as a 1000°, not 1000ft, resulting in a glob of 1677 massive overlapping circles that cover the entire globe. Not quite what I'm looking for. Obviously I'm missing something, any suggestions?

As with any fun code problem, I swear it worked earlier, honest. I futzed around for a while before I finally got it to output the right thing, then I shut it down, went to dinner, came back and re-ran it, and got the above. The obvious deduction is that something I'd done in the aforementioned futzing around had been key to getting it to work, some re-used variable or whatever, but I can't figure out what's missing in the code above.

GeoPandas 0.9.0, pyproj 3.0.1

screenshot from happier times when it worked and I got it onto a map


Solution

  • GeoPandas does exactly what is expected to do. You have to re-project your geometries to a target CRS, simply assigning it does not do anything.

    When creating the GeoDataFrame, make sure you specify in which CRS your data is. In this case it is EPSG:4326 aka geographical projection in degrees.

    ven_coords_gdf = geopandas.GeoDataFrame(ven_coords, 
                                            geometry=geopandas.points_from_xy(ven_coords.VenLon, ven_coords.VenLat),
                                            crs=4326)
    

    Once properly set, you have to reproject (transform) your coordinates to a target CRS using to_crs.

    ven_coords_gdf_projected = ven_coords_gdf.to_crs("epsg:2249")
    

    Now you can use the buffer in feet. If you want to store the result in 4326 again, you just reproject it back using to_crs(4326).

    I swear it worked earlier, honest.

    I am pretty sure it did not :).