I would like to add a new column to a dataset containing the distance of each ride. To compute such distance I am using the given coordinates of the start and end of each trip using the haversine formula.
I am succeeding at computing the distance but struggling at adding that column to the data in a way I can read it.
Without adding the column, I get a temporary column (ride_distance) of type double (as I desire) showing the values as shown below:
filtered_dataset %>% rowwise() %>%
mutate(ride_distance=distHaversine(c(start_lon, start_lat), c(end_lon, end_lat)))
What I get when trying to add that column to the data I get this added instead:
filtered_dataset$ride_distance <- filtered_dataset %>%
rowwise() %>%
mutate(distHaversine(c(start_lon, start_lat), c(end_lon, end_lat)))
Reading in the values of that column with head() I get this instead and they don't even show to be the same values...:
How could I add my distance values to the data as doubles so I can keep using it for computations?
You're confusing data frames (tibbles) with columns of tibbles.
This code:
filtered_dataset %>%
rowwise() %>%
mutate(ride_distance = distHaversine(c(start_lon, start_lat), c(end_lon, end_lat)))
produces the output you want; if you re-assign it to filtered_dataset
(i.e. filtered_dataset <- filtered_dataset %>% ...
) you'll get what you want. You could also use the %<>%
operator from the magrittr package, which assigns and pipes at the same time: filtered_dataset %<>% rowwise() %>% ...
Alternatively
filtered_dataset$ride_distance <- filtered_dataset %>%
rowwise() %>%
mutate(x = distHaversine(c(start_lon, start_lat), c(end_lon, end_lat))) %>%
pull(x)
would work.