Search code examples
rtidyverse

Filter values according to a threshold given by another variable


In the following example, I am trying to create a column 'output', where for each ID, output == 1 (0 otherwise) for all values of A that are superior or equal to the value of A when B == apple.

df1 <- data.frame(ID = c("a", "a", "a", "b", "b", "b", "b", "c", "c", "c"), A = c(2, 1, 8, 4, 3, 12, 9, 142, 13, 8), B = c("apple"
, "orange", "kiwi", "orange", "apple", "kiwi", "pear", "kiwi", "apple", "orange"), output = c(1, 0, 1, 1, 1, 1, 1, 1, 1, 0))


df1
   ID   A      B output
1   a   2  apple      1
2   a   1 orange      0
3   a   8   kiwi      1
4   b   4 orange      1
5   b   3  apple      1
6   b  12   kiwi      1
7   b   9   pear      1
8   c 142   kiwi      1
9   c  13  apple      1
10  c   8 orange      0

Best I could come up with, was in base-R df1$A >= df1$A[df1$B== "apple" & df1$ID == "a"] but I can't figure out the logic to follow here...

Ideally, I am looking for a tidyverse solution, but base-R solutions would be ok as well.

Many thanks in advance!


Solution

  • Try split and transform according to your logic.

    > split(df1, ~ID) |> 
    +   lapply(transform, out1=+(A >= A[B == 'apple'])) |> 
    +   do.call(what='rbind')
         ID   A      B output out1
    a.1   a   2  apple      1    1
    a.2   a   1 orange      0    0
    a.3   a   8   kiwi      1    1
    b.4   b   4 orange      1    1
    b.5   b   3  apple      1    1
    b.6   b  12   kiwi      1    1
    b.7   b   9   pear      1    1
    c.8   c 142   kiwi      1    1
    c.9   c  13  apple      1    1
    c.10  c   8 orange      0    0