Search code examples
rdatetimelubridateposixct

Correcting date-times with setoffs to UTC+1


This question might be based on my ignorance on how date-times work, but I struggle with timezone transformations of some logger data. I have multiple loggers which were read out at multiple time points. Unwantedly, the timezone setting was sometimes changed during readouts, so that some periods for some loggers are recorded in GMT+1 and some in GMT+2 (but constant time, no switching for daylight saving time or similar). I would like to have them all at UTC+1 (i.e. GMT+1) so that they're comparable. I created a dataframe with all the measurements (temp) of all the loggers (loggerID) over the whole time period (date_time, as character string at the moment). I added a column specifying the time zone each measurement was recorded in (timezone, either "GMT+1" or "GMT+2"). My first try was to create a POSIXct date_time and conditionally telling the function to either use GMT+1 or GMT+2:

dat_temp <- dat_temp %>%
  mutate(date_time_set = if_else(timezone == "GMT+1", as.POSIXct(date_time, format = "%Y-%m-%d %H:%M:%S", tz = "GMT+1"), as.POSIXct(date_time, format = "%Y-%m-%d %H:%M:%S", tz = "GMT+2")))

I found out that this doesn't work, as timezones have to be specified as places ("Europe/Paris" for example) or simply "UTC" or "GMT", but without set offs. I can't specify a place in Europe, as this will assume the loggers switched time for daylight saving time, right? I tried the same with lubridate, but same problem:

dat_temp <- dat_temp %>%
  mutate(date_time_set = if_else(timezone == "GMT+1", ymd_hms(dat_temp$date_time, tz = "GMT+1"), ymd_hms(dat_temp$date_time, tz = "GMT+2")))

Then I though I could just specify the time as UTC (which is basically wrong) and then manually substract 1h from the GMT+2 data, which would then set everything to UTC+1 (although it would be saved as being in UTC, which would still be wrong, as it is now in UTC+1):

dat_temp <- dat_temp %>%
  mutate(date_time_posix = as.POSIXct(date_time, format = "%Y-%m-%d %H:%M:%S", tz = "UTC"), # assumes UTC for everything if not specified otherwise
         date_time_corr = if_else(timezone == "GMT+2", date_time_posix - (1*60*60), date_time_posix)) # manually subtract 1h for GMT+2 data

I also played around with with_tz()and force_tz()from lubridate, but to no avail. So my questions are:

  1. Is there some way or function to specify timezones with setoffs (GMT+1 or UTC+2 or similar) instead of places which autiomatically do the daylight saving things?
  2. If not, is there a way to manually set all my logger data to UTC+1 and then somehow save it as date time without a timezone, so that people opening the data somewhere else on the planet don't accidentally mess up the timezones and data?

Subet of data:

> dput(dat_temp)
structure(list(date_time = c("2021-07-01 00:00:00", "2021-07-01 00:30:00", 
"2021-07-01 01:00:00", "2021-07-01 01:30:00", "2021-07-01 02:00:00", 
"2021-07-01 02:30:00", "2021-07-01 03:00:00", "2021-07-01 03:30:00", 
"2021-07-01 04:00:00", "2021-07-01 04:30:00", "2021-10-16 02:30:00", 
"2021-10-16 03:00:00", "2021-10-16 03:30:00", "2021-10-16 04:00:00", 
"2021-10-16 04:30:00", "2021-10-16 05:00:00", "2021-10-16 05:30:00", 
"2021-10-16 06:00:00", "2021-10-16 06:30:00", "2021-10-16 07:00:00", 
"2021-10-16 07:30:00", "2021-11-03 00:00:00", "2021-11-03 00:30:00", 
"2021-11-03 01:00:00", "2021-11-03 01:30:00", "2021-11-03 02:00:00", 
"2021-11-03 02:30:00", "2021-11-03 03:00:00", "2021-11-03 03:30:00", 
"2021-11-03 04:00:00", "2021-11-03 04:30:00", "2021-11-03 05:00:00", 
"2021-11-19 11:00:00", "2021-11-19 11:30:00", "2021-11-19 12:00:00", 
"2021-11-19 12:30:00", "2021-11-19 13:00:00", "2021-11-19 13:30:00", 
"2021-11-19 14:00:00", "2021-11-19 14:30:00", "2021-11-19 15:00:00", 
"2021-11-19 15:30:00", "2021-11-19 16:00:00"), temp = c(16.427, 
16.141, 15.951, 15.569, 15.282, 14.996, 14.9, 14.709, 14.517, 
14.421, 4.727, 4.623, 4.519, 4.415, 4.311, 4.207, 4.102, 3.998, 
3.893, 3.788, 3.683, 2.73, 2.624, 2.624, 2.624, 2.517, 2.517, 
2.517, 2.517, 2.624, 2.73, 2.837, 0.674, 0.674, 0.784, 1.112, 
1.872, 2.517, 2.943, 3.155, 3.155, 3.049, 2.73), loggerID = c("logger1", 
"logger1", "logger1", "logger1", "logger1", "logger1", "logger1", 
"logger1", "logger1", "logger1", "logger2", "logger2", "logger2", 
"logger2", "logger2", "logger2", "logger2", "logger2", "logger2", 
"logger2", "logger2", "logger1", "logger1", "logger1", "logger1", 
"logger1", "logger1", "logger1", "logger1", "logger1", "logger1", 
"logger1", "logger3", "logger3", "logger3", "logger3", "logger3", 
"logger3", "logger3", "logger3", "logger3", "logger3", "logger3"
), timezone = c("GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", 
"GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", 
"GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", "GMT+2", 
"GMT+2", "GMT+2", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", 
"GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", 
"GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", "GMT+1", 
"GMT+1", "GMT+1", "GMT+1")), row.names = c(1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L, 142902L, 142903L, 142904L, 142905L, 142906L, 
142907L, 142908L, 142909L, 142910L, 142911L, 142912L, 196225L, 
196226L, 196227L, 196228L, 196229L, 196230L, 196231L, 196232L, 
196233L, 196234L, 196235L, 533387L, 533388L, 533389L, 533390L, 
533391L, 533392L, 533393L, 533394L, 533395L, 533396L, 533397L
), class = "data.frame")

Solution

  • The solution should be lubridate with_tz(for display purpose), and force_tz(for change purpose).

    for your case,

    • use with_tz function can change the display format of a datetime. the original value is not changed.
    • use force_tz function can actual change the orginal datatime variable timezone to required timezone. the original value is changed.

    Here is a example code for you reference.

    library(lubridate)
    library(dplyr)
    
    # get your R system environment timezone & locale
    Sys.timezone() # get system timezone
    Sys.get_timezone() # get system date time display format
    
    # get datetime with default  ymd_hms , the display format is based on locale
     
    t_default <- ymd_hms('2021-07-01 00:00:00')
    
    # get datetime with default ymd_hms and change the display format
    t_withtz <- ymd_hms('2021-07-01 00:00:00') |> with_tz('Etc/GMT-1')
    
    # get datetime with default ymd_hms and change the timezone value.
    t_forcetz <- ymd_hms('2021-07-01 00:00:00') |> force_tz('Etc/GMT-1')
    
    # check the differnece
    ## no difference. As there is only format change between t_default and t_withtz
    t_default  - t_withtz 
    ## time zone difference.
    t_default  - t_forcetz 
     
    

    The R for data science has good document to explain the detail. the url is :https://r4ds.had.co.nz/dates-and-times.html#time-zones