Search code examples
rcalculated-columnsdplyrhaversine

Adding Haversine results to dataset as a double type


I would like to add a new column to a dataset containing the distance of each ride. To compute such distance I am using the given coordinates of the start and end of each trip using the haversine formula.

I am succeeding at computing the distance but struggling at adding that column to the data in a way I can read it.

Without adding the column, I get a temporary column (ride_distance) of type double (as I desire) showing the values as shown below:

filtered_dataset %>% rowwise() %>% 
   mutate(ride_distance=distHaversine(c(start_lon, start_lat), c(end_lon, end_lat)))

enter image description here

What I get when trying to add that column to the data I get this added instead:

filtered_dataset$ride_distance <- filtered_dataset %>%
   rowwise() %>% 
   mutate(distHaversine(c(start_lon, start_lat), c(end_lon, end_lat)))

enter image description here

Reading in the values of that column with head() I get this instead and they don't even show to be the same values...: enter image description here

How could I add my distance values to the data as doubles so I can keep using it for computations?


Solution

  • You're confusing data frames (tibbles) with columns of tibbles.

    This code:

    filtered_dataset %>%
       rowwise() %>% 
       mutate(ride_distance = distHaversine(c(start_lon, start_lat), c(end_lon, end_lat)))
    

    produces the output you want; if you re-assign it to filtered_dataset (i.e. filtered_dataset <- filtered_dataset %>% ...) you'll get what you want. You could also use the %<>% operator from the magrittr package, which assigns and pipes at the same time: filtered_dataset %<>% rowwise() %>% ...

    Alternatively

    filtered_dataset$ride_distance <- filtered_dataset %>%
       rowwise() %>% 
       mutate(x = distHaversine(c(start_lon, start_lat), c(end_lon, end_lat))) %>%
       pull(x)
    

    would work.