I'm using R to extract latitude and longitude from a data frame and then getting an address using reverse geocoding.
I have some toy data here:
latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(latitude,longitude)
I write a function to do the actual geocoding:
require(ggmap)
get_address <- function(df){
long <- as.numeric(df$longitude)
lat <- as.numeric(df$latitude)
revgeocode(c(long,lat))
}
Then apply:
apply(x,1,get_address)
Using system.time()
, this takes about a second. However, I plan to do this for data with over a million observations. If it's going to take a while to run, I don't mind, but since I'm fairly new to this, I never know whether long running times are just an inevitable part of getting the data or are due to poor function design. Is there an obvious way to significantly speed up this operation?
EDIT:
I learned from commenters that I'm going to be limited in the number of free requests (2,500/day) I can make. All of my data comes from New York, and the purpose is to match latitude/longitude coordinates with a borough name. Before I found out about the daily restrictions for free users, I had planned to get the address from Google using lat/long coordinates, extract the zip code from this address, then match the zip to a borough. Does anyone have suggestions on how to do this without using the Google Maps Geocoding API?
You could find a 'spatial' data source of the boroughs, then perform geometric operations to find points in polygons using the sf
library
Find a spatial data source. Here is one of the neighbourhoods in geojson format
library(sf)
sf <- sf::st_read("https://raw.githubusercontent.com/blackmad/neighborhoods/master/new-york-city-boroughs.geojson")
Convert your coordinates into a sf
object. I've swapped your lat & lon column order.
latitude <- c(40.84935,40.76306,40.81423,40.63464,40.71054)
longitude <- c(-73.87119,-73.90235,-73.93443,-73.88090,-73.83765)
x = data.frame(longitude, latitude)
sf_x <- sf::st_as_sf(x, coords = c("longitude", "latitude"))
To perform spatial operations, the coordinate reference system needs to match between the two geometries
## set the cooridnate reference systesm to be the same
st_crs(sf_x) <- st_crs(sf)
use st_within
to find the polygons (neighbourhoods) each point is in
res <- st_within(sf_x, sf) ## return the indexes of sf that sf_x are within
This gives you a sparse matrix of the indexes of the polygons that each point is in
## view the results
sapply(res, function(x) as.character(sf$name[x]))
# [1] "Bronx" "Queens" "Manhattan" "Brooklyn" "Queens"
Confirm with a plot
library(googleway)
x$neighbourhood <- sapply(res, function(x) as.character(sf$name[x]))
mapKey <- "map_api_key"
google_map(key = mapKey) %>%
add_markers(data = x, info_window = "neighbourhood")