Search code examples
rgeosphere

find locations within certain lat/lon distance in r


I have a gridded dataset, with data available at the following locations:

lon <- seq(-179.75,179.75, by = 0.5)
lat <- seq(-89.75,89.75, by = 0.5)

I would like to find all of the data points that are within 500 km of the location:

mylat <- 47.9625
mylon <- -87.0431

I aim to use the geosphere package in R, but the method I've currently written does not seem very efficient:

require(geosphere)
dd2 <- array(dim = c(length(lon),length(lat)))
for(i in 1:length(lon)){
  for(ii in 1:length(lat)){
    clon <- lon[i]
    clat <- lat[ii]
    dd <- as.numeric(distm(c(mylon, mylat), c(clon, clat), fun = distHaversine))
    dd2[i,ii] <- dd <= 500000
  }
}

Here, I loop through each grid in the data and find if the distance is less than 500 km. I then store a variable with either TRUE or FALSE, which I can then use to average the data (other variable). From this method, I want a matrix with TRUE or FALSE for the locations within 500 km from the lat and lon shown. Is there a more efficient method for doing this?


Solution

  • Timings:

    Comparing @nicola's and my version gives:

    Unit: milliseconds
    
                   min         lq      mean     median         uq       max neval
    nicola1 184.217002 219.924647 297.60867 299.181854 322.635960 898.52393   100
    floo01   61.341560  72.063197  97.20617  80.247810  93.292233 286.99343   100
    nicola2   3.992343   4.485847   5.44909   4.870101   5.371644  27.25858   100
    

    My original solution: (IMHO nicola's second version is much cleaner and faster.)

    You can do the following (explanation below)

    require(geosphere)
    my_coord <- c(mylon, mylat)
    dd2 <- matrix(FALSE, nrow=length(lon), ncol=length(lat))
    outer_loop_state <- 0
    for(i in 1:length(lon)){
        coods <- cbind(lon[i], lat)
        dd <- as.numeric(distHaversine(my_coord, coods))
        dd2[i, ] <- dd <= 500000
        if(any(dd2[i, ])){
          outer_loop_state <- 1
        } else {
          if(outer_loop_state == 1){
            break
          }
        }
      }
    

    Explanation:

    For the loop i apply the following logic: enter image description here

    outer_loop_state is initialized with 0. If a row with at least one raster-point inside the circle is found outer_loop_state is set to 1. Once there are no more points within the circle for a given row i break.

    The distm call in @nicola version basically does the same without this trick. So it calculates all rows.

    Code for timings:

    microbenchmark::microbenchmark(
      {allCoords<-cbind(lon,rep(lat,each=length(lon)))
      res<-matrix(distm(cbind(mylon,mylat),allCoords,fun=distHaversine)<=500000,nrow=length(lon))},
      {my_coord <- c(mylon, mylat)
      dd2 <- matrix(FALSE, nrow=length(lon), ncol=length(lat))
      outer_loop_state <- 0
      for(i in 1:length(lon)){
        coods <- cbind(lon[i], lat)
        dd <- as.numeric(distHaversine(my_coord, coods))
        dd2[i, ] <- dd <= 500000
        if(any(dd2[i, ])){
          outer_loop_state <- 1
        } else {
          if(outer_loop_state == 1){
            break
          }
        }
      }},
      {#intitialize the return
        res<-matrix(FALSE,nrow=length(lon),ncol=length(lat))
        #we find the possible value of longitude that can be closer than 500000
        #How? We calculate the distance between us and points with our same lat 
        longood<-which(distm(c(mylon,mylat),cbind(lon,mylat))<500000)
        #Same for latitude
        latgood<-which(distm(c(mylon,mylat),cbind(mylon,lat))<500000)
        #we build the matrix with only those values to exploit the vectorized
        #nature of distm
        allCoords<-cbind(lon[longood],rep(lat[latgood],each=length(longood)))
        res[longood,latgood]<-distm(c(mylon,mylat),allCoords)<=500000}
    )