Search code examples
rdistancelatitude-longitudegeosphere

Calculate distance between multiple latitude and longitude points


I have a dataset that has latitude and longitude information for participants' home and work, and I'd like to create a new column in the dataset containing the euclidean distance between home and work for each participant. I think this should be relatively simple, but all the other Q&As I've seen seem to be dealing with slightly different issues.

To start, I tried running this code (using the geosphere package):

distm(c(homelong, homelat), c(worklong, worklat), fun=distHaversine)

But got an error saying "Error in .pointsToMatrix(x) : Wrong length for a vector, should be 2" because (if I understand correctly) I'm trying to calculate the distance between multiple sets of two points.

Can I adjust this code to get what I'm looking for, or is there something else I should be trying instead? Thanks!


Solution

  • distm() returns a distance matrix, which is not what you want; you want the pairwise distances. So use the distance function (distHaversine(), distGeo(), or whatever) directly:

    library(tidyverse)
    
    locations <- tibble(
        homelong = c(0, 2),
        homelat = c(2, 5),
        worklong = c(70, 60),
        worklat = c(45, 60)
    )
    
    locations <- locations %>%
        mutate(
            dist = geosphere::distHaversine(cbind(homelong, homelat), cbind(worklong, worklat))
        )
    
    locations
    #> # A tibble: 2 × 5
    #>   homelong homelat worklong worklat     dist
    #>      <dbl>   <dbl>    <dbl>   <dbl>    <dbl>
    #> 1        0       2       70      45 8299015.
    #> 2        2       5       60      60 7809933.
    

    Note that geosphere functions want matrices as inputs, so you can cbind() your columns together. Don't c() them; that's creating a single shapeless vector and losing the differentiation between lon and lat. This is the cause of the error, I suspect; the vector only has one dimension, not two like a matrix.