I'm trying to find all stations inside Manhattan from the CitiBikes dataset. I can get the shape of Manhattan from OpenStreetMap by
q <- opq(bbox="Manhattan")
q <- add_osm_feature(q, key="name", value="Manhattan")
q <- add_osm_feature(q, key="boundary", value="administrative")
administrative <- osmdata_sf (q)
manhattan = administrative$osm_multipolygons[1]
plot(manhattan, main="Manhattan borough")
and I can convert the data to Simple Geometry points by
cbike <- read.csv(file="201903-citibike-tripdata.csv", stringsAsFactors=F,
sep=",", na.strings=c("NA","NaN", "NULL"))
cbike$start.sf <- cbike %>%
select(lat=start.station.latitude, lon=start.station.longitude) %>%
st_as_sf(coords=c("lon", "lat"), crs=st_crs(manhattan))
but I cannot get which points lie inside manhattan
. I assume that I want to use st_intersection
or st_intersects
in order to get a boolean vector, but I'm a bit lost at how that works: Any predicate comes up as empty.
> inside = cbike$start.sf[1,1]
> inside
Simple feature collection with 1 feature and 0 fields
geometry type: POINT
dimension: XY
bbox: xmin: -74.00945 ymin: 40.71107 xmax: -74.00945 ymax: 40.71107
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
geometry
1 POINT (-74.00945 40.71107)
This is a point on Fulton Street, clearly inside Manhattan. However, st_intersects(inside, manhattan)
and st_within(inside, manhattan)
are both empty. (st_intersects(inside, inside)
and st_intersects(manhattan, manhattan)
are both TRUE
, so I assume it's not that points or multipolygons cannot be intersected in general)
(This is with the osmdata
, sf
, and dplyr
packages.)
Have you checked the validity of the Manhattan geometry? Check out https://www.r-spatial.org/r/2017/03/19/invalid.html#corrup-or-invalid-geometries! You don't seem to be using the lwgeom
libary. Installing it (from sources, likely), loading, and then running st_make_valid
on manhattan
gives a valid geometry that has non-empty intersection with points in Manhattan:
manhattan <- st_make_valid(manhattan)
cbike$start.sf <- cbike %>%
select(lat=start.station.latitude, lon=start.station.longitude) %>%
st_as_sf(coords=c("lon", "lat"), crs=st_crs(manhattan))
cbike$starts.inside.manhattan <- st_intersects(cbike$start.sf, manhattan, sparse=FALSE)[,1]
(the return value of st_intersects
is the n×1 matrix of pairwise intersections, of which only the first column is required)
> cbike$starts.inside.manhattan
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[21] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE