Search code examples
rgpsgroupingposixct

R - how to assign night time GPS data (i.e data which crosses over midnight) the same start date?


I have GPS tag data which is for over night locations only. This has been filtered from a main data set already which calculated day and night based on sunset and sunrise times. As all day locations have been removed, night could be classed as from 3pm to 10am to allow for seasonal changes and make sure all data points are included.

The data is currently formatted with normal dates and times, e.g. 2019-03-22 21:53:00 and 2019-03-23 02:46:00. However, this data is technically from the same night. I want to add a new column to my data frame in R which groups data per night (rather than on their 'true' date) and assigns the dates/times after midnight to that of the starting date. So with the two dates above, they'd both be 2019-03-22.

So far I have tried the following code but it's making mistakes and the output column is a number (17976) rather than a date:

# Convert DateAndTime column to POSIXct format
sample$DateTime <- as.POSIXct(sample$DateTime, format = "%d/%m/%Y %H:%M")

# Extract date component
sample$Date <- as.Date(sample$DateTime)

# Assign the night date
sample$NightDate <- ifelse(hour(sample$DateTime) < 8,
                       sample$Date - days(1),
                       sample$Date)

# change NightDate from numbers (17976) back to a date
as.Date(sample$NightDate)

# error which occurs: Error in as.Date.numeric(GE.sample$NightDate) : 'origin' must be supplied

Due to data protection I can't share much of my data, but here's an example of the DateTime column which I'm working off:

# DateTime Data set  
2019-03-22 00:34:00
2019-03-22 02:43:00
2019-03-22 04:56:00
2019-03-22 19:28:00
2019-03-22 21:53:00
2019-03-23 02:46:00
2019-03-23 20:32:00
2019-03-23 23:04:00
2019-03-24 01:40:00
2019-03-24 04:17:00
2019-03-24 19:34:00
2019-03-24 23:22:00
2019-03-25 02:03:00
2019-03-25 04:48:00

I'm quite new to R so any help would be appreciated, thank you for your time!


Solution

  • I think there are a couple of problems here.

    1. You call as.Date(.) but don't reassign it back to the frame. This may not be a problem yet because of your error (#2), but know that you need to do

      sample$NightDate <- as.Date(...)
      
    2. A number in date-objects is the number of days since an "origin". Most computing (I believe) including R uses 1970-01-01, but Excel uses 1899-12-30. R will not assume this for you (despite many people really wanting it to default to the "epoch" origin of 1970-01-01), you need to specify it. Try

      sample$NightDate <- as.Date(sample$NightDate, origin = "1970-01-01")
      

      It can be a date object as.Date("1970-01-01") or something that is unambiguously a date, as what I did above.

    3. ifelse drops class, it's a known thing, unfortunately. You can avoid dropping the class by removing the Date object from inside of ifelse's return values. Perhaps

      sample$NightDate <- sample$DateTime - ifelse(hour(sample$DateTime) < 8, 86400, 0)
      

      where 86400 is the number of seconds in one day. (It should be noted that timestamp/POSIXt objects have units of 1 sec and date/Date objects have units of 1 day.

    Bottom line, starting with just DateTime (as a POSIXt timestamp), you can get NightDate with

    sample |>
      transform(
        NightDate = as.Date(trunc(
          DateTime - ifelse(lubridate::hour(DateTime) < 8, 86400, 0),
          by = "days"))
      )
    #               DateTime  NightDate
    # 1  2019-03-22 00:34:00 2019-03-21
    # 2  2019-03-22 02:43:00 2019-03-21
    # 3  2019-03-22 04:56:00 2019-03-21
    # 4  2019-03-22 19:28:00 2019-03-22
    # 5  2019-03-22 21:53:00 2019-03-22
    # 6  2019-03-23 02:46:00 2019-03-22
    # 7  2019-03-23 20:32:00 2019-03-23
    # 8  2019-03-23 23:04:00 2019-03-23
    # 9  2019-03-24 01:40:00 2019-03-23
    # 10 2019-03-24 04:17:00 2019-03-23
    # 11 2019-03-24 19:34:00 2019-03-24
    # 12 2019-03-24 23:22:00 2019-03-24
    # 13 2019-03-25 02:03:00 2019-03-24
    # 14 2019-03-25 04:48:00 2019-03-24
    
    

    The lubridate::hour function is useful. If you aren't using lubridate for anything else and need to reduce your use of other packages, you can use as.integer(format(DateTime, format="%H")) or as.POSIXlt(sample$DateTime)$hour, both are identical in effect (though lubridate::hour has a "fancy" way to render on the console).


    Date

    sample <- structure(list(DateTime = structure(c(1553229240, 1553236980, 1553244960, 1553297280, 1553305980, 1553323560, 1553387520, 1553396640, 1553406000, 1553415420, 1553470440, 1553484120, 1553493780, 1553503680), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = c(NA, -14L), class = "data.frame")