Search code examples
rlubridatetidy

Filter Timeseries Data Using Sunrise and Sunset Times


I'm trying to subset data that falls between sunset and sunrise. The data:

library(tidyverse)
library(lubridate)
library(suncalc)

dat <- tibble(datetime = seq(as.POSIXct('2020-08-03 00:00:00'), 
                             as.POSIXct('2020-08-09 12:00:00'), 
                             by=3600),
              var1 = rnorm(157,2,1),
              var2 = rnorm(157,3,5)) %>% 
  mutate(getSunlightTimes(date = as.Date(datetime, format = '%m/%d/%Y'),
                         lat = 43.1, lon = -76.2, tz = 'America/New_York',
                         keep = c('sunrise', 'sunset'))) %>% 
  select(c(datetime, var1, var2, sunrise, sunset)) 

Then I would like to subset the data so that I only retain rows where datetime falls between sunrise and sunset on a given day. I tried:

myrange <- as.interval(unique(dat$sunrise), unique(dat$sunset))

dat <- dat %>% 
  filter(datetime %within% myrange)

This compiles but throws warnings and does not include all the rows of data that it should. Thanks in advance.


Solution

  • Try this:

    First, create a partial dat. I'll add date, since we need it both for the calculations and for joining the data back in.

    set.seed(42)
    dat <- tibble(datetime = seq(as.POSIXct('2020-08-03 00:00:00'), 
                                 as.POSIXct('2020-08-09 12:00:00'), 
                                 by=3600),
                  var1 = rnorm(157,2,1),
                  var2 = rnorm(157,3,5)) %>%
      mutate(date = as.Date(datetime))
    dat
    # # A tibble: 157 x 4
    #    datetime             var1   var2 date      
    #    <dttm>              <dbl>  <dbl> <date>    
    #  1 2020-08-03 00:00:00  3.37 -1.00  2020-08-03
    #  2 2020-08-03 01:00:00  1.44  0.333 2020-08-03
    #  3 2020-08-03 02:00:00  2.36  9.44  2020-08-03
    #  4 2020-08-03 03:00:00  2.63  2.12  2020-08-03
    #  5 2020-08-03 04:00:00  2.40 -2.36  2020-08-03
    #  6 2020-08-03 05:00:00  1.89  3.82  2020-08-03
    #  7 2020-08-03 06:00:00  3.51  1.19  2020-08-03
    #  8 2020-08-03 07:00:00  1.91  5.95  2020-08-03
    #  9 2020-08-03 08:00:00  4.02 10.2   2020-08-03
    # 10 2020-08-03 09:00:00  1.94 -1.96  2020-08-03
    # # ... with 147 more rows
    

    Now reduce that to the distinct dates found, get the sunrise/sunset for those dates, and join back in to the original dat (with left_join). After that, we can filter the times we need.

    distinct(dat, date) %>%
      with(., getSunlightTimes(date = date,
                               lat = 43.1, lon = -76.2, tz = 'America/New_York',
                               keep = c('sunrise', 'sunset'))) %>% 
      left_join(dat, ., by = "date") %>%
      filter(sunrise <= datetime, datetime <= sunset)
    # # A tibble: 85 x 8
    #    datetime             var1  var2 date         lat   lon sunrise             sunset             
    #    <dttm>              <dbl> <dbl> <date>     <dbl> <dbl> <dttm>              <dttm>             
    #  1 2020-08-03 06:00:00 3.51   1.19 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    #  2 2020-08-03 07:00:00 1.91   5.95 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    #  3 2020-08-03 08:00:00 4.02  10.2  2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    #  4 2020-08-03 09:00:00 1.94  -1.96 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    #  5 2020-08-03 10:00:00 3.30   5.27 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    #  6 2020-08-03 11:00:00 4.29   3.42 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    #  7 2020-08-03 12:00:00 0.611  7.48 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    #  8 2020-08-03 13:00:00 1.72   1.85 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    #  9 2020-08-03 14:00:00 1.87   7.18 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    # 10 2020-08-03 15:00:00 2.64  -5.73 2020-08-03  43.1 -76.2 2020-08-03 05:59:13 2020-08-03 20:25:11
    # # ... with 75 more rows
    

    Note that we cannot use dplyr::between here, since that function only uses the first of its left and right arguments.