Search code examples
rdplyrdatatabletidyverse

Rows until/after last departure from increasing order (last ascending cycle)


I have data akin to multiple timeseries with timestamps. It is organized by group and has a cyclic component in which time increases within some cycles delimited by sudden changes in this increasing pattern (i.e. decreases).

I would like to keep only data (rows) that is before or after the last change in ascending trend (the last ascending cycle).

Some synthetic minimal data:

df <- 
  data.frame(group = c(rep("A", 10), rep("B", 10), rep("C", 10)),
             time = c(c(1:3, 2, 3:6, 5, 6), c(1:2, 1, 3, 7, 6:10), c(4, 3, 6, 4, 6, 7, 6, 8:10))         
  )

What I mean by the last change in ascending trend:

  library(dplyr)
  
# Just exemplying last change in monotonic increasing trend
df %>%
  dplyr::group_by(group) %>%
  dplyr::mutate(
    row_num = dplyr::row_number(),
    time_order = dplyr::case_when(time - dplyr::lag(time, n = 1) >= 0 ~ "increase",  
                                  time - dplyr::lag(time, n = 1) < 0 ~ "decrease",
                                  TRUE ~ "increase"),
    where_split = dplyr::if_else(dplyr::last(which(time_order == "decrease")) == row_num, "here", NA_character_)
  ) %>%
    print(n = Inf)
#> # A tibble: 30 x 5
#> # Groups:   group [3]
#>    group  time row_num time_order where_split
#>    <chr> <dbl>   <int> <chr>      <chr>      
#>  1 A         1       1 increase   <NA>       
#>  2 A         2       2 increase   <NA>       
#>  3 A         3       3 increase   <NA>       
#>  4 A         2       4 decrease   <NA>       
#>  5 A         3       5 increase   <NA>       
#>  6 A         4       6 increase   <NA>       
#>  7 A         5       7 increase   <NA>       
#>  8 A         6       8 increase   <NA>       
#>  9 A         5       9 decrease   here       
#> 10 A         6      10 increase   <NA>       
#> 11 B         1       1 increase   <NA>       
#> 12 B         2       2 increase   <NA>       
#> 13 B         1       3 decrease   <NA>       
#> 14 B         3       4 increase   <NA>       
#> 15 B         7       5 increase   <NA>       
#> 16 B         6       6 decrease   here       
#> 17 B         7       7 increase   <NA>       
#> 18 B         8       8 increase   <NA>       
#> 19 B         9       9 increase   <NA>       
#> 20 B        10      10 increase   <NA>       
#> 21 C         4       1 increase   <NA>       
#> 22 C         3       2 decrease   <NA>       
#> 23 C         6       3 increase   <NA>       
#> 24 C         4       4 decrease   <NA>       
#> 25 C         6       5 increase   <NA>       
#> 26 C         7       6 increase   <NA>       
#> 27 C         6       7 decrease   here       
#> 28 C         8       8 increase   <NA>       
#> 29 C         9       9 increase   <NA>       
#> 30 C        10      10 increase   <NA>

Created on 2022-05-17 by the reprex package (v2.0.1)

For ease of verification, I provide my solutions:

# All rows until last change in trend, by group
check_until <- 
  df %>%
  dplyr::group_by(group) %>%
  dplyr::mutate(
    row_num = dplyr::row_number(),
    time_order = dplyr::case_when(time - dplyr::lag(time, n = 1) >= 0 ~ "increase",  
                                  time - dplyr::lag(time, n = 1) < 0 ~ "decrease",
                                  TRUE ~ "increase")) %>%
      dplyr::slice(1:dplyr::last(which(time_order == "decrease"))) %>%
    dplyr::select(-c(row_num, time_order))
# All rows after last change in trend, by group
check_after <- 
  df %>%
  group_by(group) %>%
  dplyr::mutate(
    row_num = dplyr::row_number(),
    time_order = dplyr::case_when(time - lag(time, n = 1) >= 0 ~ "increase",  
                                  time - lag(time, n = 1) < 0 ~ "decrease",
                                  TRUE ~ "increase")) %>%
  dplyr::slice(dplyr::last(which(time_order == "decrease")):max(row_num)) %>%
  dplyr::select(-c(row_num, time_order)) 

My solutions work but they seem too verbose and inefficient. I am sure there are more elegant solutions. Any insights are welcomed and I am also open to datatable solutions.


Solution

  • Both scenarios can be achieved by cumsum + diff + slice(_max).

    (1) All rows until last change in trend:

    df %>%
      group_by(group) %>%
      slice(1:which.max(cumsum(c(1, diff(time) < 0)))) %>%
      ungroup()
    
    # # A tibble: 22 × 2
    #    group  time
    #    <chr> <dbl>
    #  1 A         1
    #  2 A         2
    #  3 A         3
    #  4 A         2
    #  5 A         3
    #  6 A         4
    #  7 A         5
    #  8 A         6
    #  9 A         5
    # 10 B         1
    # 11 B         2
    # 12 B         1
    # 13 B         3
    # 14 B         7
    # 15 B         6
    # 16 C         4
    # 17 C         3
    # 18 C         6
    # 19 C         4
    # 20 C         6
    # 21 C         7
    # 22 C         6
    

    (2) All rows after last change in trend:

    df %>%
      group_by(group) %>%
      slice_max(cumsum(c(1, diff(time) < 0))) %>%
      ungroup()
    
    # A tibble: 11 × 2
    #    group  time
    #    <chr> <dbl>
    #  1 A         5
    #  2 A         6
    #  3 B         6
    #  4 B         7
    #  5 B         8
    #  6 B         9
    #  7 B        10
    #  8 C         6
    #  9 C         8
    # 10 C         9
    # 11 C        10