Search code examples
rgeocodinggeospatialshapefilepoint-in-polygon

How to group latitude/longitude data into different groups based on a shapefile?


Original Problem : I have a data set where each row has latitude and longitude within limits of New York. Now I need to group each row into one of the zipcodes in New York. I have shapefiles with all the boundaries available from https://gis.ny.gov/gisdata/inventories/details.cfm?DSID=934.

Adding sample data for latitude, longitude http://pastebin.com/mXntxhK2


Solution

  • over / %over% works pretty well as @RobertH suggested.

    library(sp)
    library(raster)
    library(rgdal)
    library(dplyr)
    
    # get the shapefile without wasting bandwidth
    URL <- "http://gis.ny.gov/gisdata/data/ds_934/zip_codes_shp.zip"
    fil <- "nyzips.zip"
    if (!file.exists(fil)) download.file(URL, fil)
    shp <- grep("shp$", unzip(fil), value=TRUE)
    ny <- readOGR(shp[2], ogrListLayers(shp[2])[1], stringsAsFactors=FALSE)
    
    # you didn't give us data so we have to create some by random sampling
    # within the bounding box
    ny_area <- as(extent(bbox(ny)), "SpatialPolygons")
    set.seed(1492) # reproducible
    pts <- spsample(ny_area, 3000, "random")
    proj4string(pts) <- proj4string(ny)
    
    # this does the lon/lat to zip mapping
    zip_where <- pts %over% ny
    
    # since we fabricated data, not all will be in a zip code since
    # ny isn't a rectangle, so we remove the "bad" data
    zip_where <- zip_where[complete.cases(zip_where),]
    
    arrange(count(zip_where, POSTAL), desc(n))
    
    ## Source: local data frame [602 x 2]
    ## 
    ##    POSTAL     n
    ##     (chr) (int)
    ## 1   12847    16
    ## 2   12980    14
    ## 3   13367    14
    ## 4   13625    10
    ## 5   12843     9
    ## 6   12986     9
    ## 7   12134     8
    ## 8   12852     7
    ## 9   13324     7
    ## 10  13331     7
    ## ..    ...   ...
    

    Since you provided a sample of your coordinates, here's how to read them in and convert them to the projection of your NY shapefile so you can do the aggregation:

    pts <- read.csv("http://pastebin.com/raw.php?i=mXntxhK2", na.strings="null")
    pts <- pts[complete.cases(pts),]
    coordinates(pts) <- ~longitude+latitude
    proj4string(pts) <- CRS("+proj=longlat +datum=WGS84")
    pts <- spTransform(pts, proj4string(ny))
    
    # this does the lon/lat to zip mapping
    zip_where <- pts %over% ny
    
    # but since we fabricated data, not all will be in a zip code since
    # ny isn't a rectangle, so we remove the "bad" data
    zip_where <- zip_where[complete.cases(zip_where),]
    
    arrange(count(zip_where, POSTAL), desc(n))
    ## Source: local data frame [158 x 2]
    ## 
    ##    POSTAL     n
    ##     (chr) (int)
    ## 1   11238    28
    ## 2   11208    25
    ## 3   11230    20
    ## 4   10027    19
    ## 5   11229    17
    ## 6   11219    16
    ## 7   11385    16
    ## 8   11206    15
    ## 9   11211    15
    ## 10  11214    14
    ## ..    ...   ...