I have a dataset and I am trying to get a group locations to its nearest metropolitan. I have dataset 1 (df1) which contains the address locations with longitude and latitude. I want to map these address to all the nearest metropolitans (in a data frame df2) that are within a 50 mile radius.
g_lat <- c(45.52306, 40.26719, 34.05223, 37.38605, 37.77493)
g_lon <- c(-122.67648,-86.13490, -118.24368, -122.08385, -122.41942)
address <- c(1,2,3,4,5)
df1 <- data.frame(g_lat, g_lon, address)
g_lat <- c(+37.7737185, +45.5222208,+37.77493)
g_lon <- c(-122.2744317,-098.7041549,-122.41942)
msa <- c(1,2,3)
df2 <- data.frame(g_lat, g_lon, msa)
I want output as follows showing all the msa that this address is associated with:
address g_lat g_lon msa
5 37.77493 -122.41942 1
5 37.77493 -122.41942 3
Please kindly let me know how this can be achieved. I have tried the following:
library(geosphere)
# create distance matrix
mat <- distm(df1[,c('g_lon','g_lat')], df2[,c('g_lon','g_lat')], fun=distVincentyEllipsoid)
error:
Error in .pointsToMatrix(y) : longitude < -360
# assign the name to the point in list1 based on shortest distance in the matrix
df1$locality <- df2$locality[max.col(-mat)]
A possible solution:
library(geosphere)
mat <- distm(df1[,c('g_lon','g_lat')], df2[,c('g_lon','g_lat')], fun=distVincentyEllipsoid)
ri <- row(mat)[mat < 80000]
ci <- col(mat)[mat < 80000]
df3 <- df1[ri,]
df3$msa <- df2[ci, "msa"]
which gives:
> df3 g_lat g_lon address msa 4 37.38605 -122.0838 4 1 5 37.77493 -122.4194 5 1 4.1 37.38605 -122.0838 4 3 5.1 37.77493 -122.4194 5 3
Using either data.table or dplyr:
library(data.table)
setDT(df1)[ri][, msa := df2[ci, "msa"]][]
library(dplyr)
df1 %>%
slice(ri) %>%
mutate(msa = df2[ci, "msa"])
You can add the distance with:
df3$dist <- mat[cbind(ri, ci)]
which gives:
> df3 g_lat g_lon address msa dist 4 37.38605 -122.0838 4 1 46202.74 5 37.77493 -122.4194 5 1 12774.31 4.1 37.38605 -122.0838 4 3 52359.08 5.1 37.77493 -122.4194 5 3 0.00