Search code examples
rtimetidyversedifftime

R difftime output different depending on input formats (as.character() wrapper vs without)


example data:

test <- structure(list(date1 = structure(c(1632745800, 1632745800), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), date2 = structure(c(1641468180, 1641468180), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame"))

Is there a reason why the output of difftime differs based on whether the inputs are wrapped by as.character or not? For example:

library(tidyverse)

test <- structure(list(date1 = structure(c(1632745800, 1632745800), 
                                         tzone = "UTC", class = c("POSIXct", "POSIXt")), 
                       date2 = structure(c(1641468180, 1641468180), tzone = "UTC", class = c("POSIXct", "POSIXt"))), 
                  row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))

test %>% mutate(date_diff = difftime(date2, date1, units = "days"), 
date_diff2 = difftime(as.character(date2), as.character(date1), units = "days")) %>% 
  print.data.frame()
#>                 date1               date2     date_diff    date_diff2
#> 1 2021-09-27 12:30:00 2022-01-06 11:23:00 100.9535 days 100.9951 days
#> 2 2021-09-27 12:30:00 2022-01-06 11:23:00 100.9535 days 100.9951 days

It only differs by ~0.04 in this case, but is there a reason why? And which one would be considered correct? Thank you!


Solution

  • The conversion to character is lossy because you lose the time zone infromation. Your original datetimes are specified to be in UTC. If you use as.character() and reparse them, they get interpreted as your local time, where it seems like one of the dates uses daylight savings and the other does not, resulting in an additional one hour difference.

    x <- as.POSIXct(1632745800, tz = "UTC")
    y <- as.POSIXct(1641468180, tz = "UTC")
    
    x
    #> [1] "2021-09-27 12:30:00 UTC"
    as.character(x)
    #> [1] "2021-09-27 12:30:00"
    as.POSIXct(as.character(x))
    #> [1] "2021-09-27 12:30:00 BST"
    as.POSIXct(as.character(y))
    #> [1] "2022-01-06 11:23:00 GMT"