Search code examples
rgisopenstreetmap

How do I filter all points within an OSM shape?


I'm trying to find all stations inside Manhattan from the CitiBikes dataset. I can get the shape of Manhattan from OpenStreetMap by

q <- opq(bbox="Manhattan")
q <- add_osm_feature(q, key="name", value="Manhattan")
q <- add_osm_feature(q, key="boundary", value="administrative")
administrative <- osmdata_sf (q)
manhattan = administrative$osm_multipolygons[1]
plot(manhattan, main="Manhattan borough")

and I can convert the data to Simple Geometry points by

cbike <- read.csv(file="201903-citibike-tripdata.csv", stringsAsFactors=F, 
                  sep=",", na.strings=c("NA","NaN", "NULL"))
cbike$start.sf <- cbike %>%
  select(lat=start.station.latitude, lon=start.station.longitude) %>%
  st_as_sf(coords=c("lon", "lat"), crs=st_crs(manhattan))

but I cannot get which points lie inside manhattan. I assume that I want to use st_intersection or st_intersects in order to get a boolean vector, but I'm a bit lost at how that works: Any predicate comes up as empty.

> inside = cbike$start.sf[1,1]
> inside
Simple feature collection with 1 feature and 0 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: -74.00945 ymin: 40.71107 xmax: -74.00945 ymax: 40.71107
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
                    geometry
1 POINT (-74.00945 40.71107)

This is a point on Fulton Street, clearly inside Manhattan. However, st_intersects(inside, manhattan) and st_within(inside, manhattan) are both empty. (st_intersects(inside, inside) and st_intersects(manhattan, manhattan) are both TRUE, so I assume it's not that points or multipolygons cannot be intersected in general)

(This is with the osmdata, sf, and dplyr packages.)


Solution

  • Have you checked the validity of the Manhattan geometry? Check out https://www.r-spatial.org/r/2017/03/19/invalid.html#corrup-or-invalid-geometries! You don't seem to be using the lwgeom libary. Installing it (from sources, likely), loading, and then running st_make_valid on manhattan gives a valid geometry that has non-empty intersection with points in Manhattan:

    manhattan <- st_make_valid(manhattan)
    
    cbike$start.sf <- cbike %>%
      select(lat=start.station.latitude, lon=start.station.longitude) %>%
      st_as_sf(coords=c("lon", "lat"), crs=st_crs(manhattan))
    
    cbike$starts.inside.manhattan <- st_intersects(cbike$start.sf, manhattan, sparse=FALSE)[,1]
    

    (the return value of st_intersects is the n×1 matrix of pairwise intersections, of which only the first column is required)

    > cbike$starts.inside.manhattan
       [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
      [21]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE