Search code examples
rdatefinancelubridate

Using lubridate to extract time periods from in R


Context

As a followup to Exclude specific time periods in R

str(databank[[1]])
'data.frame':   987344 obs. of  13 variables:
 $ Date      : Factor w/ 43 levels "01/03/2017","02/03/2017",..: 17 17 17 17 17 17 17 17 17 17 ...
 $ Time      : Factor w/ 23400 levels "01:00:00 PM",..: 15344 15343 15342 15341 15340 15339 15338 15337 15336 15335 ...
 $ Bar.      : Factor w/ 63033 levels "","1/63032","10/63032",..: 58929 1 1 1 1 1 1 1 58928 1 ...
 $ Bar.Index : int  0 NA NA NA NA NA NA NA -1 NA ...
 $ Tick.Range: int  5 NA NA NA NA NA NA NA 0 NA ...
 $ Open      : num  16.9 NA NA NA NA ...
 $ High      : num  16.9 NA NA NA NA ...
 $ Low       : num  16.9 NA NA NA NA ...
 $ Close     : num  16.9 NA NA NA NA ...
 $ Vol       : num  900 0 0 0 0 0 0 0 100 0 ...
 $ MACDHist  : num  -137 NA NA NA NA ...
 $ MACD      : num  -225 NA NA NA NA ...
 $ MACDSig   : num  -87.9 NA NA NA NA ...

head(databank[[1]])
Date        Time        Bar. Bar.Index Tick.Range  Open  High  Low Close
1 12/04/2017 10:45:43 AM 63032/63032         0          5 16.95 16.95 16.9 16.95
2 12/04/2017 10:45:42 AM                    NA         NA    NA    NA   NA    NA
3 12/04/2017 10:45:41 AM                    NA         NA    NA    NA   NA    NA
4 12/04/2017 10:45:40 AM                    NA         NA    NA    NA   NA    NA
5 12/04/2017 10:45:39 AM                    NA         NA    NA    NA   NA    NA
6 12/04/2017 10:45:38 AM                    NA         NA    NA    NA   NA    NA
  Vol MACDHist    MACD MACDSig
1 900  -136.77 -224.68  -87.91
2   0       NA      NA      NA
3   0       NA      NA      NA
4   0       NA      NA      NA
5   0       NA      NA      NA
6   0       NA      NA      NA

Problem

I attempted to implement the top answer's lubridate method using:

test1 <- databank[[1]][hour(d) == 9 & minute(d) > 30,] 

But it only returns times from 9:30:00 to 9:59:59, to get times from 9:35:00 to 15:55:00...

Things I tried

test1 <- databank[[1]][hour(d) == 9 & minute(d) > 30, hour(d) == 15 & minute(d) < 55]

and

test1 <- databank[[1]][hour(d) == 9 & minute(d) > 30 & hour(d) == 15 & minute(d) < 55, ] 

but the former returns an empty table with ~79,000 blank rows (only has the entry number) and no headers and the latter, an empty table with just the headers. I thought that it is an issue because my date and times are not in POSIX but ran into trouble into converting them...

What am I missing?


Solution

  • The data sample databank[[1]] given in the actual question (here) is different from the situation in the referenced question Exclude specific time periods in R (there):

    1. The timestamp there had already been converted to class POSIXct while the Dateand Timehere are in separate factor columns.
    2. Here, Time uses a 12 hour clock with AM/PM indicator.

    It might be possible to work with the factor levels of Time but this is unreliable. So, the safest way IMHO is to create a POSIXct timestamp from the Date and Time columns and to select by time of day (without date) later on.

    Add time stamp

    databank[[1L]]$datetime <- 
      with(databank[[1L]], as.POSIXct(paste(Date, Time), "%d/%m/%Y %I:%M:%S %p", tz = "GMT"))
    

    Add time of day

    For convenience, a time_of_day (without date) column is added as character:

    databank[[1L]]$time_of_day <- 
      with(databank[[1L]], format(datetime, "%T"))
    
    databank[[1L]][, c("Date", "Time", "datetime", "time_of_day")]
    #         Date        Time            datetime time_of_day
    #1: 12/04/2017 10:45:43 AM 2017-04-12 10:45:43    10:45:43
    #2: 12/04/2017 10:45:42 AM 2017-04-12 10:45:42    10:45:42
    #3: 12/04/2017 10:45:41 AM 2017-04-12 10:45:41    10:45:41
    #4: 12/04/2017 10:45:40 AM 2017-04-12 10:45:40    10:45:40
    #5: 12/04/2017 10:45:39 AM 2017-04-12 10:45:39    10:45:39
    #6: 12/04/2017 10:45:38 AM 2017-04-12 10:45:38    10:45:38
    #7: 12/04/2017 10:45:00 PM 2017-04-12 22:45:00    22:45:00
    

    Note that I've added a PM time for illustration.

    Select rows by time of day range

    databank[[1L]][time_of_day >= "09:35:00" & time_of_day < "15:55:00", ]