Search code examples
rif-statementcategorical-data

How do I make my logical arguments work? The code is fine but the values are somehow flawed


library(lubridate)
library(dplyr)
library(suncalc)

Here is a list of the first 6 rows of my data. I pulled only necessary columns for what I'm trying to do.

structure(list(Date.of.Capture = structure(c(18383, 18393, 18395, 
18395, 18402, 18815), class = "Date"), Month = c(5L, 5L, 5L, 
5L, 5L, 7L), Day = c(1L, 11L, 13L, 13L, 20L, 7L), Year = c(2020L, 
2020L, 2020L, 2020L, 2020L, 2021L), Time.of.Capture = c("6:24", 
"6:27", "8:55", "8:55", "20:22", "6:26"), Time = structure(c(1588314240, 
1589178420, 1589360100, 1589360100, 1590006120, 1625639160), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), ToD = c(NA, NA, NA, NA, "Daytime", 
NA), lat = c(40.75336, 40.75336, 40.75336, 40.75336, 40.75336, 
40.75336), lon = c(-111.624088, -111.624088, -111.624088, -111.624088, 
-111.624088, -111.624088), sunriseEnd = structure(c(1588336111, 
1589199430, 1589372111, 1589372111, 1589976545, 1625659614), class = c("POSIXct", 
"POSIXt"), tzone = "MST"), sunsetStart = structure(c(1588386056, 
1589250673, 1589423593, 1589423593, 1590028795, 1625713092), class = c("POSIXct", 
"POSIXt"), tzone = "MST"), dawn = structure(c(1588334143, 1589197398, 
1589370065, 1589370065, 1589974454, 1625657442), class = c("POSIXct", 
"POSIXt"), tzone = "MST"), dusk = structure(c(1588388024, 1589252706, 
1589425639, 1589425639, 1590030886, 1625715264), class = c("POSIXct", 
"POSIXt"), tzone = "MST")), class = "data.frame", row.names = c(NA, 
6L))

Below is code I used to format my columns in a way that suncalc could use them and then created a separate dataframe in which I left joined to the main "md" dataframe.

md$Time <- paste0(md$Year, "-", md$Month, "-", md$Day,  " ", md$Time.of.Capture,":00")

md$Time <- ymd_hms(md$Time)

md$Date.of.Capture <- paste0(md$Year, "-", md$Month, "-", md$Day)

md$Date.of.Capture <- as.Date(md$Date.of.Capture, format = "%Y-%m-%d", tz = "MST")

timesofday <- getSunlightTimes(date = md$Date.of.Capture, 
                               lat = 40.753360, lon = -111.624088, 
                               tz="MST", keep=c("sunriseEnd", "sunsetStart", "dawn", "dusk"))

md <- left_join(md, timesofday, by = c("Date.of.Capture" = "date"))

Here is the code I attempted to run to pull information from my "Time" column and compare it to my "dusk", "dawn", "sunsetStart", and "sunriseEnd" columns for all entries to then place them into categorical variables based on which "Time" the entry occured in. For example, my code attempts to take every value in "Time" column greater than "dawn" column & every value less than "sunriseEnd" column to place it in the "ToD" column as "Dawn" (and continue with all other arguments if untrue).

md$ToD<-NA

for(i in 1: nrow(md)){
  if (md$Time[i] > md$dawn[i] & md$Time[i] < md$sunriseEnd[i]){
    md$ToD[i] <- "Dawn"
  } else if (md$Time[i] > md$sunsetStart[i] & md$Time[i] < md$dusk[i]){
    md$ToD[i] <- "Dusk"
  } else if (md$Time[i] > md$dusk[i] & md$Time[i] < md$dawn[i]){
    md$ToD[i] <- "Nighttime"
  } else if (md$Time[i] > md$sunriseEnd[i] & md$Time[i] < md$sunsetStart[i]){
    md$ToD[i] <- "Daytime"
  }
}

unique(md$ToD)

I was expecting these logical arguments to work but for some reason they don't. It may be something in the way the formatting code was written because the problem seems to be within the dataset.

When another person tried a simple little test: md$Time > md$dawn. Those should all be TRUE, but instead this is what I get: [1] FALSE FALSE FALSE FALSE TRUE FALSE


Solution

  • Time > dawn is not working as you expect because the timezones are different:

    attr(md$Time, 'tzone')
    # [1] "UTC"
    attr(md$dawn, 'tzone')
    # [1] "MST"
    

    Dealing with time zones can be problematic, especially when many data sources don't include timezone (inference!), they just give an hour offset without verification of DST, or they just get it wrong. I'll assume that each field's "tzone" is correct, and your assumption of true/false is merely due to visual cues.

    You can remedy this a little (aiding only visual interpretation, not changing the numeric "moment in time" at all) by reassigning the attribute:

    attr(md$dawn, 'tzone') <- "UTC"
    

    (or assign "MST" to Time). Don't forget to verify/change the other POSIXt columns as well.

    I've not done this for the code below, over to you if you'd like to do it. Again, doing this does not change any logic, since reassigning the timezone does not change what moment the value represents at all.

    From here, I think we can use dplyr::case_when for this.

    Notes:

    • While it is unlikely with floating-point comparison that > will fail, I'm going to shift one end each to non-strict inequality >= so that we're "guaranteed" ... strong word ... to enclose all times.

    • I'm going to introduce the use of between and rearrange the order of comparisons, I think it clarifies the flow a little (though has little effect on the logic.

    • You are comparing a timestamp with "today's" dawn and dusk, but if we are pre-dawn then comparing with the same-row dusk is going to fail since it is much later today. We need to be comparing with yesterday's dusk. I think we have three approaches, each with their imperfections:

      1. Use lag(dusk) ... but since we don't have every day, exactly one row per day, this will not work.
      2. Default to "Nighttime", where the other three are unambiguously the same "today". This is probably fine.
      3. We can use dusk - 86400 (one day of seconds). It is almost certainly inaccurate, but it is likely only off on the order of seconds, so it can also work. For this to work, we'll need to also do dawn + 86400 for the late-evening component.
    md %>%
      mutate(ToD = case_when(
          dplyr::between(Time, dawn, sunriseEnd)        ~ "Dawn", 
          dplyr::between(Time, sunriseEnd, sunsetStart) ~ "Daytime", 
          dplyr::between(Time, sunsetStart, dusk)       ~ "Dusk", 
          dplyr::between(Time, dusk - 86400, dawn) | 
          dplyr::between(Time, dusk, dawn + 86400)      ~ "Nighttime"
        )
      )
    #   Date.of.Capture Month Day Year Time.of.Capture                Time       ToD      lat       lon          sunriseEnd
    # 1      2020-05-01     5   1 2020            6:24 2020-05-01 06:24:00 Nighttime 40.75336 -111.6241 2020-05-01 05:28:31
    # 2      2020-05-11     5  11 2020            6:27 2020-05-11 06:27:00 Nighttime 40.75336 -111.6241 2020-05-11 05:17:10
    # 3      2020-05-13     5  13 2020            8:55 2020-05-13 08:55:00 Nighttime 40.75336 -111.6241 2020-05-13 05:15:11
    # 4      2020-05-13     5  13 2020            8:55 2020-05-13 08:55:00 Nighttime 40.75336 -111.6241 2020-05-13 05:15:11
    # 5      2020-05-20     5  20 2020           20:22 2020-05-20 20:22:00   Daytime 40.75336 -111.6241 2020-05-20 05:09:05
    # 6      2021-07-07     7   7 2021            6:26 2021-07-07 06:26:00 Nighttime 40.75336 -111.6241 2021-07-07 05:06:54
    #           sunsetStart                dawn                dusk
    # 1 2020-05-01 19:20:56 2020-05-01 04:55:43 2020-05-01 19:53:44
    # 2 2020-05-11 19:31:13 2020-05-11 04:43:18 2020-05-11 20:05:06
    # 3 2020-05-13 19:33:13 2020-05-13 04:41:05 2020-05-13 20:07:19
    # 4 2020-05-13 19:33:13 2020-05-13 04:41:05 2020-05-13 20:07:19
    # 5 2020-05-20 19:39:55 2020-05-20 04:34:14 2020-05-20 20:14:46
    # 6 2021-07-07 19:58:12 2021-07-07 04:30:42 2021-07-07 20:34:24