Search code examples
rapplyggmap

Reverse geocoding speed


I'm using R to extract latitude and longitude from a data frame and then getting an address using reverse geocoding.

I have some toy data here:

latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(latitude,longitude)

I write a function to do the actual geocoding:

require(ggmap)
get_address <- function(df){
      long <- as.numeric(df$longitude)
      lat <- as.numeric(df$latitude)
      revgeocode(c(long,lat))
    }

Then apply:

apply(x,1,get_address)

Using system.time(), this takes about a second. However, I plan to do this for data with over a million observations. If it's going to take a while to run, I don't mind, but since I'm fairly new to this, I never know whether long running times are just an inevitable part of getting the data or are due to poor function design. Is there an obvious way to significantly speed up this operation?

EDIT:

I learned from commenters that I'm going to be limited in the number of free requests (2,500/day) I can make. All of my data comes from New York, and the purpose is to match latitude/longitude coordinates with a borough name. Before I found out about the daily restrictions for free users, I had planned to get the address from Google using lat/long coordinates, extract the zip code from this address, then match the zip to a borough. Does anyone have suggestions on how to do this without using the Google Maps Geocoding API?


Solution

  • You could find a 'spatial' data source of the boroughs, then perform geometric operations to find points in polygons using the sf library


    Setting up the data

    Find a spatial data source. Here is one of the neighbourhoods in geojson format

    library(sf)
    
    sf <- sf::st_read("https://raw.githubusercontent.com/blackmad/neighborhoods/master/new-york-city-boroughs.geojson")
    

    Convert your coordinates into a sf object. I've swapped your lat & lon column order.

    latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
    longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
    x = data.frame(longitude, latitude)
    
    sf_x <- sf::st_as_sf(x, coords = c("longitude", "latitude"))
    

    To perform spatial operations, the coordinate reference system needs to match between the two geometries

    ## set the cooridnate reference systesm to be the same
    st_crs(sf_x) <- st_crs(sf)
    

    use st_within to find the polygons (neighbourhoods) each point is in

    Point-in-polygon calculation

    res <- st_within(sf_x, sf)  ## return the indexes of sf that sf_x are within
    

    This gives you a sparse matrix of the indexes of the polygons that each point is in

    ## view the results
    sapply(res, function(x) as.character(sf$name[x]))
    # [1] "Bronx"     "Queens"    "Manhattan" "Brooklyn"  "Queens" 
    

    Visual

    Confirm with a plot

    library(googleway)
    
    x$neighbourhood <- sapply(res, function(x) as.character(sf$name[x]))
    
    mapKey <- "map_api_key"
    
    google_map(key = mapKey) %>%
      add_markers(data = x, info_window = "neighbourhood")
    

    enter image description here


    Further Reading