Search code examples
rdistancelatitude-longitudespatialgeosphere

Find closest points (lat / lon) from one data set to a second data set


I have two data sets, A and B, which give locations of different points in the UK as such:

A = data.frame(reference = c(C, D, E), latitude = c(55.32043, 55.59062, 55.60859), longitude = c(-2.3954998, -2.0650243, -2.0650542))

B = data.frame(reference = c(C, D, E), latitude = c(55.15858, 55.60859, 55.59062), longitude = c(-2.4252843, -2.0650542, -2.0650243))

A has 400 rows and B has 1800 rows.
For all the rows in A, I would like to find the shortest distance in kilometers between a point in A and each of the three closest points in B, as well as the reference and coordinates in lat and long of these points in B.

I tried using this post

R - Finding closest neighboring point and number of neighbors within a given radius, coordinates lat-long

However, even when I follow all the instructions, mainly using the command distm from the package geosphere, the distance comes up in a unit that can't possibly be kilometers. I don't see what to change in the code, especially since I am not familiar at all with the geo packages.


Solution

  • Here is solution using a single loop and vectorizing the distance calculation (converted to km).
    The code is using base R's rank function to order/sort the list of calculated distances.
    The indexes and the calculated distances of the 3 shortest values are store back in data frame A.

    library(geosphere)
    
    A = data.frame(longitude = c(-2.3954998, -2.0650243, -2.0650542), latitude = c(55.32043, 55.59062, 55.60859))
    B = data.frame(longitude = c(-2.4252843, -2.0650542, -2.0650243), latitude = c(55.15858, 55.60859, 55.59062))
    
    for(i in 1:nrow(A)){
      #calucate distance against all of B
      distances<-geosphere::distGeo(A[i,], B)/1000
      #rank the calculated distances
      ranking<-rank(distances, ties.method = "first")
    
      #find the 3 shortest and store the indexes of B back in A
      A$shortest[i]<-which(ranking ==1) #Same as which.min()
      A$shorter[i]<-which(ranking==2)
      A$short[i]<-which(ranking ==3)
    
      #store the distances back in A
      A$shortestD[i]<-distances[A$shortest[i]] #Same as min()
      A$shorterD[i]<-distances[A$shorter[i]]
      A$shortD[i]<-distances[A$short[i]]
    }
    A
    
      longitude latitude shortest shorter short shortestD  shorterD   shortD
    1 -2.395500 55.32043        1       3     2  18.11777 36.633310 38.28952
    2 -2.065024 55.59062        3       2     1   0.00000  2.000682 53.24607
    3 -2.065054 55.60859        2       3     1   0.00000  2.000682 55.05710
    

    As M Viking pointed out, for the geosphere package the data must be arranged Lon then Lat.