Search code examples
rdataframemaxposixct

how to find the maximum value of a column dataframe in a specific POSIXct time interval in r?


In my dataframe CORtrial I have two columns, rDate in POSixct format which goes from 2015-07-27 17:45:00 until 2017-08-31 16:55:00 and REFN630 in numeric format. A value of REFN630 is recorded in rDate in a time interval of 5 in 5 minutes.

That's the structure of my dataframe:

dput(head(CORtrial,10))
structure(list(rDate = structure(c(1438019100, 1438019400, 1438019700, 
1438020000, 1438020300, 1438020600, 1438020900, 1438021200, 1438021500, 
1438021800), class = c("POSIXct", "POSIXt"), tzone = ""), REFN630 = c(0.0111940298507463, 
0.0671936758893281, 0.0143198090692124, 0.0087719298245614, 0.00936768149882904, 
0.00985221674876847, 0.00775193798449612, 0.00815217391304348, 
0.00859598853868195, 0.00911854103343465)), row.names = c(NA, 
-10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000024f693c1ef0>)

I want to know the max value between a specific date and time interval, for instance, I want to know the MAXIMUM value of REFN630 between the rDate 2015-10-23 11:25:00 and 2015-10-23 11:40:00. How can I do this?

2) After getting the maximum value throughout:

df %>%
  filter(ymd_hms(rDate) %within% 
           interval(ymd_hms("2015-07-27 22:25:00"), ymd_hms("2015-07-27 22:50:00"))) %>%
  slice_max(order_by = REFN630)

                 rDate    REFN630
1: 2015-07-27 22:25:00 0.01431981

More columns were added to the dataframe and the structure is the following:

 dput(head(COR_trial,10))
structure(list(rDate = structure(c(1438015500, 1438015800, 1438016100, 
1438016400, 1438016700, 1438017000, 1438017300, 1438017600, 1438017900, 
1438018200), class = c("POSIXct", "POSIXt"), tzone = ""), REFN532 = c(0.0127971, 
0.1348315, 0.0215983, 0.0143443, 0.0150862, 0.0158014, 0.0167866, 
0.0152284, 0.0162162, 0.0172911), REFN570 = c(0.0172414, 0.1515748, 
0.0171306, 0.0149573, 0.0157303, 0.0142518, 0.0150376, 0.016, 
0.0142045, 0.0151976), REFN630 = c(0.011194, 0.0671937, 0.0143198, 
0.0087719, 0.0093677, 0.0098522, 0.0077519, 0.0081522, 0.008596, 
0.0091185), REFN800 = c(0.0169082, 0.1030928, 0.0560472, 0.0569801, 
0.0574018, 0.0573248, 0.0531561, 0.0520833, 0.0510949, 0.0498084
)), row.names = c(NA, 10L), class = "data.frame")

Now when I use the code to get the maximum of REFN630, this is what I get:

COR_trial %>%
   filter(ymd_hms(rDate) %within% 
            interval(ymd_hms("2015-07-27 16:00:00"), ymd_hms("2015-07-27 18:00:00"))) %>%
   slice_max(order_by = REFN630)
                rDate   REFN532   REFN570   REFN630   REFN800
1 2015-07-27 17:50:00 0.1348315 0.1515748 0.0671937 0.1030928

The desired output would be:

 rDate   REFN630 
1 2015-07-27 17:50:00 0.0671937

How can I do this? Thanks in advance.


Solution

  • I think you can use the following solution. I needed to change the interval dates as the ones you mentioned on your question were not present in the data set and would lead to an empty data set after filtering. Here we first transform our preferred dates to Date class within interval function from lubridate package. Then we filter our data set for only those rDates that fall within this interval.

    library(dplyr)
    library(lubridate)
    
    df %>%
      filter(ymd_hms(rDate) %within% 
               interval(ymd_hms("2015-07-27 22:25:00"), ymd_hms("2015-07-27 22:50:00"))) %>%
      summarise(Maximum = max(REFN630, na.rm = TRUE))
    
         Maximum
    1 0.01431981
    

    In order to have only the name of the column that contains max value:

    library(tidyr)
    
    df %>%
      filter(ymd_hms(rDate) %within% 
               interval(ymd_hms("2015-07-27 21:15:00"), ymd_hms("2015-07-27 21:30:00"))) %>%
      select(rDate, REFN630) %>%
      slice_max(order_by = REFN630)
    
                    rDate   REFN630
    1 2015-07-27 21:20:00 0.0671937