Search code examples
rdplyrlogical-operatorsdata-wrangling

Identify rows with a value greater than threshold, but only direct one above per group


Suppose we have a dataset with a grouping variable, a value, and a threshold that is unique per group. Say I want to identify a value that is greater than a threshold, but only one.

test <- data.frame(
  grp = c("A", "A", "A", "B", "B", "B"),
  value = c(1, 3, 5, 1, 3, 5),
  threshold = c(4,4,4,2,2,2)
)

want <- data.frame(
  grp = c("A", "A", "A", "B", "B", "B"),
  value = c(1, 3, 5, 1, 3, 5),
  threshold = c(4,4,4,2,2,2),
  want = c(NA, NA, "yes", NA, "yes", NA)
)

In the table above, Group A has a threshold of 4, and only value of 5 is higher. But in Group B, threshold is 2, and both value of 3 and 5 is higher. However, only row with value of 3 is marked.

enter image description here

I was able to do this by identifying which rows had value greater than threshold, then removing the repeated value:

library(dplyr)
test %>%
  group_by(grp) %>%
  mutate(want = if_else(value > threshold, "yes", NA_character_)) %>%
  mutate(across(want, ~replace(.x, duplicated(.x), NA)))

enter image description here

I was wondering if there was a direct way to do this using a single logical statement rather than doing it two-step method, something along the line of:

test %>%
  group_by(grp) %>%
  mutate(want = if_else(???, "yes", NA_character_))

The answer doesn't have to be on R either. Just a logical step explanation would suffice as well. Perhaps using a rank?

Thank you!


Solution

  • Here is more direct way:

    The essential part: With min(which((value > threshold) == TRUE) we get the first TRUE in our column,

    Next we use an ifelse and check the number we get to the row number and set the conditions:

    library(dplyr)
    
    test %>%
      group_by(grp) %>% 
      mutate(want = ifelse(row_number()==min(which((value > threshold) == TRUE)),
                           "yes", NA_character_))
    
     grp   value threshold want 
      <chr> <dbl>     <dbl> <chr>
    1 A         1         4 NA   
    2 A         3         4 NA   
    3 A         5         4 yes  
    4 B         1         2 NA   
    5 B         3         2 yes  
    6 B         5         2 NA   
    >