Search code examples
rlubridatedplyr

Filter out all rows of a df below a certain value of a particular column


I have a df which has two columns, time and val. The df is arranged by time. I want to filter out all the rows from the max value, in this case it is 1.29. I have provided example below:

library(tidyverse)
library(lubridate)


# This is the entire df
df1 <- tibble::tribble(
  ~date, ~val,
  "2021-09-16 11:02:45", 1.21,
  "2021-09-16 11:02:45", 1.21,
  "2021-09-16 11:02:45", 1.21,
  "2021-09-16 11:02:45", 1.22,
  "2021-09-16 11:02:45", 1.22,
  "2021-09-16 11:02:45", 1.22,
  "2021-09-16 11:02:37", 1.22,
  "2021-09-16 10:59:29", 1.29,
  "2021-09-16 10:59:14", 1.29,
  "2021-09-16 10:59:14", 1.28,
  "2021-09-16 10:59:14", 1.28,
  "2021-09-16 10:58:17", 1.28,
  "2021-09-16 10:58:17", 1.28,
  "2021-09-16 10:58:05", 1.26,
  "2021-09-16 10:58:05", 1.26,
  "2021-09-16 10:58:05", 1.23,
  "2021-09-16 10:57:16", 1.23
  
  ) %>%
  mutate(date = ymd_hms(date))


# This is the outcome I am looking for
tibble::tribble(
  ~date, ~val,
  "2021-09-16 10:59:29", 1.29,
  "2021-09-16 10:59:14", 1.29,
  "2021-09-16 10:59:14", 1.28,
  "2021-09-16 10:59:14", 1.28,
  "2021-09-16 10:58:17", 1.28,
  "2021-09-16 10:58:17", 1.28,
  "2021-09-16 10:58:05", 1.26,
  "2021-09-16 10:58:05", 1.26,
  "2021-09-16 10:58:05", 1.23,
  "2021-09-16 10:57:16", 1.23
  
) %>%
  mutate(date = ymd_hms(date))

How to do this efficiently, any ideas?


Solution

  • If I understood correctly this might solve your problem

    library(dplyr)
    
    df1 %>% 
      filter(date <= first(date[val == max(val)]))
    
    # A tibble: 10 x 2
       date                  val
       <dttm>              <dbl>
     1 2021-09-16 10:59:29  1.29
     2 2021-09-16 10:59:14  1.29
     3 2021-09-16 10:59:14  1.28
     4 2021-09-16 10:59:14  1.28
     5 2021-09-16 10:58:17  1.28
     6 2021-09-16 10:58:17  1.28
     7 2021-09-16 10:58:05  1.26
     8 2021-09-16 10:58:05  1.26
     9 2021-09-16 10:58:05  1.23
    10 2021-09-16 10:57:16  1.23