Reverse geocoding speed

I'm using R to extract latitude and longitude from a data frame and then getting an address using reverse geocoding.

I have some toy data here:

latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(latitude,longitude)

I write a function to do the actual geocoding:

require(ggmap)
get_address <- function(df){
      long <- as.numeric(df$longitude)
      lat <- as.numeric(df$latitude)
      revgeocode(c(long,lat))
    }

Then apply:

apply(x,1,get_address)

Using system.time(), this takes about a second. However, I plan to do this for data with over a million observations. If it's going to take a while to run, I don't mind, but since I'm fairly new to this, I never know whether long running times are just an inevitable part of getting the data or are due to poor function design. Is there an obvious way to significantly speed up this operation?

EDIT:

I learned from commenters that I'm going to be limited in the number of free requests (2,500/day) I can make. All of my data comes from New York, and the purpose is to match latitude/longitude coordinates with a borough name. Before I found out about the daily restrictions for free users, I had planned to get the address from Google using lat/long coordinates, extract the zip code from this address, then match the zip to a borough. Does anyone have suggestions on how to do this without using the Google Maps Geocoding API?

Solution

You could find a 'spatial' data source of the boroughs, then perform geometric operations to find points in polygons using the sf library

Setting up the data

Find a spatial data source. Here is one of the neighbourhoods in geojson format

library(sf)

sf <- sf::st_read("https://raw.githubusercontent.com/blackmad/neighborhoods/master/new-york-city-boroughs.geojson")

Convert your coordinates into a sf object. I've swapped your lat & lon column order.

latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(longitude, latitude)

sf_x <- sf::st_as_sf(x, coords = c("longitude", "latitude"))

To perform spatial operations, the coordinate reference system needs to match between the two geometries

## set the cooridnate reference systesm to be the same
st_crs(sf_x) <- st_crs(sf)

use st_within to find the polygons (neighbourhoods) each point is in

Point-in-polygon calculation

res <- st_within(sf_x, sf)  ## return the indexes of sf that sf_x are within

This gives you a sparse matrix of the indexes of the polygons that each point is in

## view the results
sapply(res, function(x) as.character(sf$name[x]))
# [1] "Bronx"     "Queens"    "Manhattan" "Brooklyn"  "Queens"

Visual

Confirm with a plot

library(googleway)

x$neighbourhood <- sapply(res, function(x) as.character(sf$name[x]))

mapKey <- "map_api_key"

google_map(key = mapKey) %>%
  add_markers(data = x, info_window = "neighbourhood")

Reverse geocoding speed

Setting up the data

Point-in-polygon calculation

Visual

Further Reading