I have downloaded the large .shapefile
at the zip code
level from Census.
The link is here : cb_2017_us_zcta510_500k.shp (https://www2.census.gov/geo/tiger/TIGER_RD18/LAYER/ZCTA520/)
The problem is that reading into geopandas
shows that, obviously, it includes alaska and all the small island around.
gg.head(1)
Out[709]:
ZCTA5CE20 GEOID20 CLASSFP20 MTFCC20 FUNCSTAT20 ALAND20 \
0 35592 35592 B5 G6350 S 298552385
AWATER20 INTPTLAT20 INTPTLON20 \
0 235989 +33.7427261 -088.0973903
geometry
0 POLYGON ((-88.24735 33.65390, -88.24713 33.65415, -88.24656 33.65454, -88.24658 33.65479, -88.24672 33.65497, -88.24672 33.65520, -88.24626 33.65559, -88.24601 33.65591, -88.24601 33.65630, -88.24...
I know there is an easy solution in R (that uses the area of a polygon, see how to remove all the small islands from the Census Shapefile (zip code level)?) but what can I do here in Python?
Thanks!
This can certainly be done using a CONUS shape definition file; however, the continental US has the convenient property of falling within a bounding box (and all non-CONUS geographies fall out of it). So the easiest way would be to filter using a bounding box:
# generous bounding box
x1, y1, x2, y2 = (-130, 20, -50, 50)
gg_wgs84 = gg.to_crs('epsg:4326')
gg_conus = gg[
(gg_wgs84.centroid.x > x1)
& (gg_wgs84.centroid.y > y1)
& (gg_wgs84.centroid.x < x2)
& (gg_wgs84.centroid.y < y2)
]