I'm working on bike sharing data and have variables showing start and end longitude and latitude. With the coordinates, I want to calculate the distance to be able to analyse it, but my codes aren't working, Can anyone help please, I'm new to R. The codes I have tried are:
Annual_Trips <- Annual_Trips %>%
rowwise() %>%
mutate(Distance = distHaversine(c(start_lng, start_lat), c(end_lng, end_lat)))
Annual_Trips <- Annual_Trips %>%
rowwise() %>%
mutate(Distance = distm(c(start_lng, start_lat), c(end_lng, end_lat), fun = distHaversine))
P:S- I loaded the geosphere, tidyverse and dplyr packages
When I run the codes, they just run on a loop endlessly. What am I doing wrong? I ideally want to show the distance on Km or Miles.
This is a subset of the data-frame for context
structure(list(ride_id = c("620BC6107255BF4C", "4471C70731AB2E45",
"26CA69D43D15EE14", "362947F0437E1514", "BB731DE2F2EC51C5"),
rideable_type = c("electric_bike", "electric_bike", "electric_bike",
"electric_bike", "electric_bike"), started_at = structure(c(1634903202,
1634803957, 1634398119, 1634397468, 1634768274), class = c("POSIXct",
"POSIXt"), tzone = ""), ended_at = structure(c(1634903390,
1634804054, 1634398586, 1634397543, 1634768770), class = c("POSIXct",
"POSIXt"), tzone = ""), start_station_name = c("Kingsbury St & Kinzie St",
"", "", "", ""), start_station_id = c("KA1503000043", "",
"", "", ""), end_station_name = c("", "", "", "", ""), end_station_id = c("",
"", "", "", ""), start_lat = c(41.8891863333333, 41.93, 41.92,
41.92, 41.89), start_lng = c(-87.6384953333333, -87.7, -87.7,
-87.69, -87.71), end_lat = c(41.89, 41.93, 41.94, 41.92,
41.89), end_lng = c(-87.63, -87.71, -87.72, -87.69, -87.69
), member_casual = c("member", "member", "member", "member",
"member")), row.names = c(NA, 5L), class = "data.frame")
Looks like you do some row-wise operations on a large dataset. distHaversine
is vectorized, so feeding it with columns (and using base::transform
instead of dplyr::mutate
) should be much faster:
Annual_Trips |>
transform(Distance=distHaversine(cbind(start_lng, start_lat), cbind(end_lng, end_lat)))
# ride_id rideable_type started_at
# 1 620BC6107255BF4C electric_bike 2021-10-22 13:46:42
# 2 4471C70731AB2E45 electric_bike 2021-10-21 10:12:37
# 3 26CA69D43D15EE14 electric_bike 2021-10-16 17:28:39
# 4 362947F0437E1514 electric_bike 2021-10-16 17:17:48
# 5 BB731DE2F2EC51C5 electric_bike 2021-10-21 00:17:54
# ended_at start_station_name start_station_id
# 1 2021-10-22 13:49:50 Kingsbury St & Kinzie St KA1503000043
# 2 2021-10-21 10:14:14
# 3 2021-10-16 17:36:26
# 4 2021-10-16 17:19:03
# 5 2021-10-21 00:26:10
# end_station_name end_station_id start_lat start_lng end_lat end_lng
# 1 41.88919 -87.6385 41.89 -87.63
# 2 41.93000 -87.7000 41.93 -87.71
# 3 41.92000 -87.7000 41.94 -87.72
# 4 41.92000 -87.6900 41.92 -87.69
# 5 41.89000 -87.7100 41.89 -87.69
# member_casual Distance
# 1 member 709.8101
# 2 member 828.1745
# 3 member 2774.9420
# 4 member 0.0000
# 5 member 1657.3871