library(lubridate)
library(dplyr)
library(suncalc)
Here is a list of the first 6 rows of my data. I pulled only necessary columns for what I'm trying to do.
structure(list(Date.of.Capture = structure(c(18383, 18393, 18395,
18395, 18402, 18815), class = "Date"), Month = c(5L, 5L, 5L,
5L, 5L, 7L), Day = c(1L, 11L, 13L, 13L, 20L, 7L), Year = c(2020L,
2020L, 2020L, 2020L, 2020L, 2021L), Time.of.Capture = c("6:24",
"6:27", "8:55", "8:55", "20:22", "6:26"), Time = structure(c(1588314240,
1589178420, 1589360100, 1589360100, 1590006120, 1625639160), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), ToD = c(NA, NA, NA, NA, "Daytime",
NA), lat = c(40.75336, 40.75336, 40.75336, 40.75336, 40.75336,
40.75336), lon = c(-111.624088, -111.624088, -111.624088, -111.624088,
-111.624088, -111.624088), sunriseEnd = structure(c(1588336111,
1589199430, 1589372111, 1589372111, 1589976545, 1625659614), class = c("POSIXct",
"POSIXt"), tzone = "MST"), sunsetStart = structure(c(1588386056,
1589250673, 1589423593, 1589423593, 1590028795, 1625713092), class = c("POSIXct",
"POSIXt"), tzone = "MST"), dawn = structure(c(1588334143, 1589197398,
1589370065, 1589370065, 1589974454, 1625657442), class = c("POSIXct",
"POSIXt"), tzone = "MST"), dusk = structure(c(1588388024, 1589252706,
1589425639, 1589425639, 1590030886, 1625715264), class = c("POSIXct",
"POSIXt"), tzone = "MST")), class = "data.frame", row.names = c(NA,
6L))
Below is code I used to format my columns in a way that suncalc could use them and then created a separate dataframe in which I left joined to the main "md" dataframe.
md$Time <- paste0(md$Year, "-", md$Month, "-", md$Day, " ", md$Time.of.Capture,":00")
md$Time <- ymd_hms(md$Time)
md$Date.of.Capture <- paste0(md$Year, "-", md$Month, "-", md$Day)
md$Date.of.Capture <- as.Date(md$Date.of.Capture, format = "%Y-%m-%d", tz = "MST")
timesofday <- getSunlightTimes(date = md$Date.of.Capture,
lat = 40.753360, lon = -111.624088,
tz="MST", keep=c("sunriseEnd", "sunsetStart", "dawn", "dusk"))
md <- left_join(md, timesofday, by = c("Date.of.Capture" = "date"))
Here is the code I attempted to run to pull information from my "Time" column and compare it to my "dusk", "dawn", "sunsetStart", and "sunriseEnd" columns for all entries to then place them into categorical variables based on which "Time" the entry occured in. For example, my code attempts to take every value in "Time" column greater than "dawn" column & every value less than "sunriseEnd" column to place it in the "ToD" column as "Dawn" (and continue with all other arguments if untrue).
md$ToD<-NA
for(i in 1: nrow(md)){
if (md$Time[i] > md$dawn[i] & md$Time[i] < md$sunriseEnd[i]){
md$ToD[i] <- "Dawn"
} else if (md$Time[i] > md$sunsetStart[i] & md$Time[i] < md$dusk[i]){
md$ToD[i] <- "Dusk"
} else if (md$Time[i] > md$dusk[i] & md$Time[i] < md$dawn[i]){
md$ToD[i] <- "Nighttime"
} else if (md$Time[i] > md$sunriseEnd[i] & md$Time[i] < md$sunsetStart[i]){
md$ToD[i] <- "Daytime"
}
}
unique(md$ToD)
I was expecting these logical arguments to work but for some reason they don't. It may be something in the way the formatting code was written because the problem seems to be within the dataset.
When another person tried a simple little test: md$Time > md$dawn
. Those should all be TRUE
, but instead this is what I get: [1] FALSE FALSE FALSE FALSE TRUE FALSE
Time > dawn
is not working as you expect because the timezones are different:
attr(md$Time, 'tzone')
# [1] "UTC"
attr(md$dawn, 'tzone')
# [1] "MST"
Dealing with time zones can be problematic, especially when many data sources don't include timezone (inference!), they just give an hour offset without verification of DST, or they just get it wrong. I'll assume that each field's "tzone"
is correct, and your assumption of true/false is merely due to visual cues.
You can remedy this a little (aiding only visual interpretation, not changing the numeric "moment in time" at all) by reassigning the attribute:
attr(md$dawn, 'tzone') <- "UTC"
(or assign "MST"
to Time
). Don't forget to verify/change the other POSIXt
columns as well.
I've not done this for the code below, over to you if you'd like to do it. Again, doing this does not change any logic, since reassigning the timezone does not change what moment the value represents at all.
From here, I think we can use dplyr::case_when
for this.
Notes:
While it is unlikely with floating-point comparison that >
will fail, I'm going to shift one end each to non-strict inequality >=
so that we're "guaranteed" ... strong word ... to enclose all times.
I'm going to introduce the use of between
and rearrange the order of comparisons, I think it clarifies the flow a little (though has little effect on the logic.
You are comparing a timestamp with "today's" dawn and dusk, but if we are pre-dawn
then comparing with the same-row dusk
is going to fail since it is much later today. We need to be comparing with yesterday's dusk
. I think we have three approaches, each with their imperfections:
lag(dusk)
... but since we don't have every day, exactly one row per day, this will not work."Nighttime"
, where the other three are unambiguously the same "today". This is probably fine.dusk - 86400
(one day of seconds). It is almost certainly inaccurate, but it is likely only off on the order of seconds, so it can also work. For this to work, we'll need to also do dawn + 86400
for the late-evening component.md %>%
mutate(ToD = case_when(
dplyr::between(Time, dawn, sunriseEnd) ~ "Dawn",
dplyr::between(Time, sunriseEnd, sunsetStart) ~ "Daytime",
dplyr::between(Time, sunsetStart, dusk) ~ "Dusk",
dplyr::between(Time, dusk - 86400, dawn) |
dplyr::between(Time, dusk, dawn + 86400) ~ "Nighttime"
)
)
# Date.of.Capture Month Day Year Time.of.Capture Time ToD lat lon sunriseEnd
# 1 2020-05-01 5 1 2020 6:24 2020-05-01 06:24:00 Nighttime 40.75336 -111.6241 2020-05-01 05:28:31
# 2 2020-05-11 5 11 2020 6:27 2020-05-11 06:27:00 Nighttime 40.75336 -111.6241 2020-05-11 05:17:10
# 3 2020-05-13 5 13 2020 8:55 2020-05-13 08:55:00 Nighttime 40.75336 -111.6241 2020-05-13 05:15:11
# 4 2020-05-13 5 13 2020 8:55 2020-05-13 08:55:00 Nighttime 40.75336 -111.6241 2020-05-13 05:15:11
# 5 2020-05-20 5 20 2020 20:22 2020-05-20 20:22:00 Daytime 40.75336 -111.6241 2020-05-20 05:09:05
# 6 2021-07-07 7 7 2021 6:26 2021-07-07 06:26:00 Nighttime 40.75336 -111.6241 2021-07-07 05:06:54
# sunsetStart dawn dusk
# 1 2020-05-01 19:20:56 2020-05-01 04:55:43 2020-05-01 19:53:44
# 2 2020-05-11 19:31:13 2020-05-11 04:43:18 2020-05-11 20:05:06
# 3 2020-05-13 19:33:13 2020-05-13 04:41:05 2020-05-13 20:07:19
# 4 2020-05-13 19:33:13 2020-05-13 04:41:05 2020-05-13 20:07:19
# 5 2020-05-20 19:39:55 2020-05-20 04:34:14 2020-05-20 20:14:46
# 6 2021-07-07 19:58:12 2021-07-07 04:30:42 2021-07-07 20:34:24