Search code examples
rdistancelatitude-longitude

Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)


I have 2 lists (list1, list2) with latitude / longitudes of various locations. One list (list2) has locality names that list1 does not have.

I want an approximate locality for every point in list1 as well. So I want to take a point in list1, try to look for the nearest point in list2 and take that locality. I repeat for every point in list1. It also want the distance (in meters) and the index of the point (in list1) so I can build some business rules around it - essentially these are 2 new cols that should be added to list1 (near_dist, indx).

I am using the gdist function, but I'm unable to get this to work with data frame inputs.

Example input lists:

list1 <- data.frame(longitude = c(80.15998, 72.89125, 77.65032, 77.60599, 
                                  72.88120, 76.65460, 72.88232, 77.49186, 
                                  72.82228, 72.88871), 
                    latitude = c(12.90524, 19.08120, 12.97238, 12.90927, 
                                 19.08225, 12.81447, 19.08241, 13.00984,
                                 18.99347, 19.07990))
list2 <- data.frame(longitude = c(72.89537, 77.65094, 73.95325, 72.96746, 
                                  77.65058, 77.66715, 77.64214, 77.58415,
                                  77.76180, 76.65460), 
                    latitude = c(19.07726, 13.03902, 18.50330, 19.16764, 
                                 12.90871, 13.01693, 13.00954, 12.92079,
                                 13.02212, 12.81447), 
                    locality = c("A", "A", "B", "B", "C", "C", "C", "D", "D", "E"))

Solution

  • To calculate the geographic distance between two points with latitude/longitude coordinates, you can use several formula's. The package geosphere has the distCosine, distHaversine, distVincentySphere and distVincentyEllipsoid for calculating the distance. Of these, the distVincentyEllipsoid is considered the most accurate one, but is computationally more intensive than the other ones.

    With one of these functions, you can make a distance matrix. Based on that matrix you can then assign locality names based on shortest distance with which.min and the corresponding distance with min (see for this the last part of the answer) like this:

    library(geosphere)
    
    # create distance matrix
    mat <- distm(
      list1[,c('longitude','latitude')],
      list2[,c('longitude','latitude')],
      fun = distHaversine
    )
    
    # assign the name to the point in list1 based on shortest distance in the matrix
    list1$locality <- list2$locality[max.col(-mat)]
    

    this gives:

    > list1
       longitude latitude locality
    1   80.15998 12.90524        D
    2   72.89125 19.08120        A
    3   77.65032 12.97238        C
    4   77.60599 12.90927        D
    5   72.88120 19.08225        A
    6   76.65460 12.81447        E
    7   72.88232 19.08241        A
    8   77.49186 13.00984        D
    9   72.82228 18.99347        A
    10  72.88871 19.07990        A
    

    Another possibility is to assign the locality based on the average longitude and latitude values of the localitys in list2:

    library(dplyr)
    list2a <- list2 %>%
      group_by(locality) %>%
      summarise_each(funs(mean)) %>%
      ungroup()
    mat2 <- distm(
      list1[,c('longitude','latitude')],
      list2a[,c('longitude','latitude')],
      fun = distVincentyEllipsoid
    )
    list1 <- list1 %>%
      mutate(locality2 = list2a$locality[max.col(-mat2)])
    

    or with data.table:

    library(data.table)
    list2a <- setDT(list2)[,lapply(.SD, mean), by=locality]
    mat2 <- distm(
      setDT(list1)[,.(longitude,latitude)],
      list2a[,.(longitude,latitude)],
      fun = distVincentyEllipsoid
    )
    list1[, locality2 := list2a$locality[max.col(-mat2)] ]
    

    this gives:

    > list1
       longitude latitude locality locality2
    1   80.15998 12.90524        D         D
    2   72.89125 19.08120        A         B
    3   77.65032 12.97238        C         C
    4   77.60599 12.90927        D         C
    5   72.88120 19.08225        A         B
    6   76.65460 12.81447        E         E
    7   72.88232 19.08241        A         B
    8   77.49186 13.00984        D         C
    9   72.82228 18.99347        A         B
    10  72.88871 19.07990        A         B
    

    As you can see, this leads in most (7 out of 10) occasions to another assigned locality.


    You can add the distance with:

    list1$near_dist <- apply(mat2, 1, min)
    

    or another approach with max.col (which is highly probable faster):

    list1$near_dist <- mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)]
    
    # or using dplyr
    list1 <- list1 %>% mutate(near_dist = mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)])
    # or using data.table (if not already a data.table, convert it with 'setDT(list1)' )
    list1[, near_dist := mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)] ]
    

    the result:

    > list1
        longitude latitude locality locality2   near_dist
     1:  80.15998 12.90524        D         D 269966.8970
     2:  72.89125 19.08120        A         B  65820.2047
     3:  77.65032 12.97238        C         C    739.1885
     4:  77.60599 12.90927        D         C   9209.8165
     5:  72.88120 19.08225        A         B  66832.7223
     6:  76.65460 12.81447        E         E      0.0000
     7:  72.88232 19.08241        A         B  66732.3127
     8:  77.49186 13.00984        D         C  17855.3083
     9:  72.82228 18.99347        A         B  69456.3382
    10:  72.88871 19.07990        A         B  66004.9900