Search code examples
rcoordinatesdistance

How to calculate distance using coordinates


I'm working on bike sharing data and have variables showing start and end longitude and latitude. With the coordinates, I want to calculate the distance to be able to analyse it, but my codes aren't working, Can anyone help please, I'm new to R. The codes I have tried are:

Annual_Trips <- Annual_Trips %>% 
  rowwise() %>% 
  mutate(Distance = distHaversine(c(start_lng, start_lat), c(end_lng, end_lat)))


Annual_Trips <- Annual_Trips %>% 
  rowwise() %>% 
  mutate(Distance = distm(c(start_lng, start_lat), c(end_lng, end_lat), fun = distHaversine))

P:S- I loaded the geosphere, tidyverse and dplyr packages

When I run the codes, they just run on a loop endlessly. What am I doing wrong? I ideally want to show the distance on Km or Miles.

This is a subset of the data-frame for context

structure(list(ride_id = c("620BC6107255BF4C", "4471C70731AB2E45", 
"26CA69D43D15EE14", "362947F0437E1514", "BB731DE2F2EC51C5"), 
    rideable_type = c("electric_bike", "electric_bike", "electric_bike", 
    "electric_bike", "electric_bike"), started_at = structure(c(1634903202, 
    1634803957, 1634398119, 1634397468, 1634768274), class = c("POSIXct", 
    "POSIXt"), tzone = ""), ended_at = structure(c(1634903390, 
    1634804054, 1634398586, 1634397543, 1634768770), class = c("POSIXct", 
    "POSIXt"), tzone = ""), start_station_name = c("Kingsbury St & Kinzie St", 
    "", "", "", ""), start_station_id = c("KA1503000043", "", 
    "", "", ""), end_station_name = c("", "", "", "", ""), end_station_id = c("", 
    "", "", "", ""), start_lat = c(41.8891863333333, 41.93, 41.92, 
    41.92, 41.89), start_lng = c(-87.6384953333333, -87.7, -87.7, 
    -87.69, -87.71), end_lat = c(41.89, 41.93, 41.94, 41.92, 
    41.89), end_lng = c(-87.63, -87.71, -87.72, -87.69, -87.69
    ), member_casual = c("member", "member", "member", "member", 
    "member")), row.names = c(NA, 5L), class = "data.frame")

Solution

  • Looks like you do some row-wise operations on a large dataset. distHaversine is vectorized, so feeding it with columns (and using base::transform instead of dplyr::mutate) should be much faster:

    Annual_Trips |>
      transform(Distance=distHaversine(cbind(start_lng, start_lat), cbind(end_lng, end_lat)))
    #            ride_id rideable_type          started_at
    # 1 620BC6107255BF4C electric_bike 2021-10-22 13:46:42
    # 2 4471C70731AB2E45 electric_bike 2021-10-21 10:12:37
    # 3 26CA69D43D15EE14 electric_bike 2021-10-16 17:28:39
    # 4 362947F0437E1514 electric_bike 2021-10-16 17:17:48
    # 5 BB731DE2F2EC51C5 electric_bike 2021-10-21 00:17:54
    #              ended_at       start_station_name start_station_id
    # 1 2021-10-22 13:49:50 Kingsbury St & Kinzie St     KA1503000043
    # 2 2021-10-21 10:14:14                                          
    # 3 2021-10-16 17:36:26                                          
    # 4 2021-10-16 17:19:03                                          
    # 5 2021-10-21 00:26:10                                          
    #   end_station_name end_station_id start_lat start_lng end_lat end_lng
    # 1                                  41.88919  -87.6385   41.89  -87.63
    # 2                                  41.93000  -87.7000   41.93  -87.71
    # 3                                  41.92000  -87.7000   41.94  -87.72
    # 4                                  41.92000  -87.6900   41.92  -87.69
    # 5                                  41.89000  -87.7100   41.89  -87.69
    #   member_casual  Distance
    # 1        member  709.8101
    # 2        member  828.1745
    # 3        member 2774.9420
    # 4        member    0.0000
    # 5        member 1657.3871