Search code examples
rtidyversecase-when

How to apply a function within a group that depends on which row within that group holds a value?


I have a dataset that looks like the following, where each ID has 3 levels, and where one of those levels has a value (and all other levels within that ID are 0):

ID level value
1  1     0
1  2     0
1  3     1
2  1     0
2  2     1
2  3     0

I need to return a similar dataframe, with an additional column which specifies which row within the ID has the value 1. In this case:

ID level value which
1  1     0     3
1  2     0     0
1  3     1     0
2  1     0     2
2  2     1     0
2  3     0     0

I feel like I should be able to create this somehow by group_by(ID) and then a mutate based on a case_when that refers to the rows relative to the group (i.e. if it is the 1st, 2nd, or 3rd row), but I can't crack how that should work.

Any suggestions are much appreciated!


Solution

  • You can use which or better which.max which is guaranteed to return only 1 value.

    library(dplyr)
    
    df %>%
      group_by(ID) %>%
      mutate(which = which.max(value) * +(row_number() == 1))
    
    #     ID level value which
    #  <int> <int> <int> <int>
    #1     1     1     0     3
    #2     1     2     0     0
    #3     1     3     1     0
    #4     2     1     0     2
    #5     2     2     1     0
    #6     2     3     0     0
    

    +(row_number() == 1) is to ensure that the value of which is assigned to only 1st row in the group and rest all the rows are 0.