Search code examples
rlubridateswapmutateabsolute-value

Swap start_time with end_time (vice versa) in a table using R


I am dealing with a bike sharing data set and in its raw data, the start_time has been keyed in with the end_time and vice versa. start time is later than end time

Is there a way I can swap the two columns for the affected rows or is there a way to get an absolute result when I use start_time - end_time?

this was what I did to get the ride duration (start_time - end_time):

#new column to show ride duration
# step 1 change type to what we want dttm
Q4_2019 <- Q4_2019 %>% 
  mutate(start_time = as_datetime(start_time)) %>% 
  mutate (end_time = as_datetime(end_time))
# step 2 create new column showing ride_duration
Q4_2019 <- Q4_2019 %>% 
  mutate(ride_duration = end_time - start_time)
df <- structure(list(trip_id = c(25625850, 25625849, 25625851, 25625843, 
25625841, 25625838, 25625830, 25625839, 25241500, 25293548, 25380297, 
25382592, 25425647, 25432182, 25447046, 25535424, 25539306), 
    start_time = structure(c(1572746268, 1572746133, 1572746297, 
    1572745919, 1572745831, 1572745669, 1572745401, 1572745684, 
    1569960882, 1570290715, 1570747626, 1570782378, 1571127078, 
    1571158556, 1571248410, 1571846602, 1571854341), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), end_time = structure(c(1572742886, 
    1572742912, 1572743307, 1572742982, 1572743065, 1572742960, 
    1572743396, 1572743808, 1569960943, 1570290776, 1570747687, 
    1570782439, 1571127139, 1571158617, 1571248471, 1571846663, 
    1571854402), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    bikeid = c(217, 5059, 6133, 2920, 4179, 964, 4141, 2214, 
    6029, 2081, 5397, 976, 6130, 3819, 1725, 6372, 6058), tripduration = c(5310, 
    379, 609, 663, 834, 891, 1594, 1724, 61, 61, 61, 61, 61, 
    61, 61, 61, 61), from_station_id = c(340, 109, 301, 460, 
    298, 229, 632, 131, 293, 76, 328, 308, 664, 174, 77, 91, 
    253)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-17L))

Solution

  • library(tidyverse)
    
    # using mutate
    df %>%
      mutate(diff = abs(difftime(start_time, end_time)),
             start_time = pmin(start_time, end_time), # credit to Darren Tsai for pmin idea :-) 
             end_time = start_time + diff) %>%
      select(-diff) # you can delete this (and the last %>%) if you want to keep the diff column 
    
    # using transform (credit to G. Grothendieck)
    df %>%
      transform(start_time = pmin(start_time, end_time),
                end_time = pmax(start_time, end_time))