Search code examples
pythongeopandasshapefile

how to keep continental U.S.A. from shapefile at the zip code level?


I have downloaded the large .shapefile at the zip code level from Census.

The link is here : cb_2017_us_zcta510_500k.shp (https://www2.census.gov/geo/tiger/TIGER_RD18/LAYER/ZCTA520/) The problem is that reading into geopandas shows that, obviously, it includes alaska and all the small island around.

enter image description here

gg.head(1)
Out[709]: 
  ZCTA5CE20 GEOID20 CLASSFP20 MTFCC20 FUNCSTAT20    ALAND20  \
0     35592   35592        B5   G6350          S  298552385   

   AWATER20   INTPTLAT20    INTPTLON20  \
0    235989  +33.7427261  -088.0973903   

                                                                                                                                                                                                  geometry  
0  POLYGON ((-88.24735 33.65390, -88.24713 33.65415, -88.24656 33.65454, -88.24658 33.65479, -88.24672 33.65497, -88.24672 33.65520, -88.24626 33.65559, -88.24601 33.65591, -88.24601 33.65630, -88.24...  

I know there is an easy solution in R (that uses the area of a polygon, see how to remove all the small islands from the Census Shapefile (zip code level)?) but what can I do here in Python?

Thanks!


Solution

  • This can certainly be done using a CONUS shape definition file; however, the continental US has the convenient property of falling within a bounding box (and all non-CONUS geographies fall out of it). So the easiest way would be to filter using a bounding box:

    # generous bounding box
    x1, y1, x2, y2 = (-130, 20, -50, 50)
    
    gg_wgs84 = gg.to_crs('epsg:4326')
    gg_conus = gg[
        (gg_wgs84.centroid.x > x1)
        & (gg_wgs84.centroid.y > y1)
        & (gg_wgs84.centroid.x < x2)
        & (gg_wgs84.centroid.y < y2)
    ]