Search code examples
rdplyrtidyverse

Create column giving next occurrence of equal or smaller value in other column


I want to create a column col2 which lists the distance in rows of each element in col1 to the next value in col1 which is equal to or lower.

Where there are no more elements which are equal or lower, it give the distance to the end of the colummn.

Example dataframe:

df1 <- data.frame(col1 = c(1,2,2,1,2,3,2))

  col1
1    1
2    2
3    2
4    1
5    2
6    3
7    2

Desired output in this example:

  col1 col2
1    1    3
2    2    1
3    2    1
4    1    4
5    2    3
6    3    1
7    2    1

My attempt so far is as per below. While this seems to work when I try it with vectors, it isn't working in the function.

This is to input into a broader function which I'm trying to write.

df1 |>
  mutate(
      col2 = sapply(row_number(), \(x) {
      distance <- match(TRUE, col1[x] <= col1[(x+1):n()]) 
      distance
    })
  )

Solution

  • The main problem with the current approach is handling cases where there are no more lower/equal values - in those cases, we want the distance to the end of the dataframe. Your first 4 should be a 3, since it takes 3 rows to be at 1 again which is equal or lower.

    df1 |>
      mutate(
        col2 = sapply(seq_len(n()), \(x) {
          matches <- which(col1[-(1:x)] <= col1[x])
          if (length(matches)) matches[1] else n() - x + 1
        })
      )
    
    col1 col2
    1 3
    2 1
    2 1
    1 4
    2 2
    3 1
    2 1