Search code examples
rdatedatetimedstposixct

R method to shift standard to daylight savings time by shifting entire dataset up an hour/down an hour for half the year?


There are lots of questions about daylight savings conversion and posixct/posixlt, date.time, etc., but none that I have found appear to address what my approach would be to daylight savings.

I am interested in analyzing daily load curves for energy use, and approaches which just cut the spring hour out of the dataset do not work for me. I need an approach that shifts all measurements to the subsequent hour after spring daylight savings and to the prior hour after the fall adjustment. See below for a clear example.


EnergyUse <- data.table("Date"= c("1997-04-06", "1997-04-06", "1997-04-06", "1997-04-06"), "Hour"= 1:4, "Power"=c(30,40,60,80))

print(EnergyUse)
#             Date   Hour     Power
#1:     1997-04-06      1        30
#2:     1997-04-06      2        40 #when daylight savings kicked in for 1997
#3:     1997-04-06      3        60
#4:     1997-04-06      4        80

The "Hour" field ranges from 0 to 23 for every day of the year, i.e. "local standard time". It happens to be Pacific Time, as you will see below, but I would have the same question for any time zone that implemented daylight savings.

Now I need to merge date and time field into single date_time field formatted as date and time and incorporating daylight savings, as I am interested in the hourly power patterns (i.e. load curves), which shift both based on relative time (e.g. when people go to/get off work) and absolute time (e.g. when it gets cold/hot or when sun sets).

EnergyUseAdj <- EnergyUse[, Date_Time := as.POSIXct(paste(Date, Hour), format="%Y-%m-%d %H", tz="America/Los_Angeles")]

which results in:

print(EnergyUseAdj)

#         Date Hour  Power            Date_Time
#1: 1997-04-06    1     30  1997-04-06 01:00:00
#2: 1997-04-06    2     40                 <NA>
#3: 1997-04-06    3     60  1997-04-06 03:00:00
#4: 1997-04-06    4     80  1997-04-06 04:00:00

This, however, makes the "Power" data for the new daylight savings 3am and 4am incorrect. The actual power production figure for the daylight adjusted 3am would instead be that which was listed for 2am standard time (i.e. 40), and that for 4am would then be 60.

The correct way to adjust for this, albeit likely more computationally expensive for large datasets, would be to adjust the entire time-series by a positive offset of 1 hour in spring and a negative offset of 1 hour in fall, like the below:

#         Date Hour  Power            Date_Time
#1: 1997-04-06    1     30  1997-04-06 01:00:00
#2: 1997-04-06    2   <NA>                 <NA>
#3: 1997-04-06    3     40  1997-04-06 03:00:00
#4: 1997-04-06    4     60  1997-04-06 04:00:00

Or, perhaps smoother for use in other algorithms due to lack of NA lines, like the below:

#         Date Hour  Power            Date_Time
#1: 1997-04-06    1     30  1997-04-06 01:00:00
#2: 1997-04-06    3     40  1997-04-06 03:00:00
#3: 1997-04-06    4     60  1997-04-06 04:00:00
#4: 1997-04-06    5     80  1997-04-06 05:00:00

After toying around with Posixct and reading through a bunch of similar questions on this adjustment, I could not find a great solution. Any ideas?

EDIT: GregorThomas' request, see below for a larger sample of data in case you wish to use two days' worth.

#       OP_DATE OP_HOUR Power
# 1: 1997-04-05       0    71
# 2: 1997-04-05       1    61
# 3: 1997-04-05       2    54
# 4: 1997-04-05       3    57
# 5: 1997-04-05       4    68
# 6: 1997-04-05       5    76
# 7: 1997-04-05       6    89
# 8: 1997-04-05       7   106
# 9: 1997-04-05       8   148
#10: 1997-04-05       9   154
#11: 1997-04-05      10   143
#12: 1997-04-05      11   123
#13: 1997-04-05      12   105
#14: 1997-04-05      13    94
#15: 1997-04-05      14    85
#16: 1997-04-05      15    86
#17: 1997-04-05      16    84
#18: 1997-04-05      17    83
#19: 1997-04-05      18    99
#20: 1997-04-05      19   105
#21: 1997-04-05      20    94
#22: 1997-04-05      21    95
#23: 1997-04-05      22    81
#24: 1997-04-05      23    66
#25: 1997-04-06       0    75
#26: 1997-04-06       1    70
#27: 1997-04-06       2    62
#28: 1997-04-06       3    56
#29: 1997-04-06       4    55
#30: 1997-04-06       5    57
#31: 1997-04-06       6    51
#32: 1997-04-06       7    57
#33: 1997-04-06       8    59
#34: 1997-04-06       9    61
#35: 1997-04-06      10    64
#36: 1997-04-06      11    63
#37: 1997-04-06      12    63
#38: 1997-04-06      13    63
#39: 1997-04-06      14    60
#40: 1997-04-06      15    68
#41: 1997-04-06      16    69
#42: 1997-04-06      17    69
#43: 1997-04-06      18    91
#44: 1997-04-06      19   120
#45: 1997-04-06      20   100
#46: 1997-04-06      21    74
#47: 1997-04-06      22    56
#48: 1997-04-06      23    55

Solution

  • If your data is reliably every hour, you can calculate a sequence of hours of the appropriate length. The implementation of POSIX datetimes accounts for daylight savings time, leap years, etc.

    Simplifying the method in my comment, I'd recommending calculating the sequence based on the start time and the length.

    EnergyUseAdj <- EnergyUse[,
      Date_Time := seq(
        from = as.POSIXct(paste(Date[1], Hour[1]), format="%Y-%m-%d %H", tz="America/Los_Angeles"),
        length.out = .N,
        by = "1 hour"
      )]