Search code examples
rdataframesample

Apply function to one random row per group (in specified set of groups)


I have the following data frame df = data.frame(name = c("abc", "abc", "abc", "def", "def", "ghi", "ghi", "jkl", "jkl", "jkl", "jkl", "jkl"), ignore = c(0,1,0,0,1,1,1,0,0,0,1,1), time = 31:42)

name | ignore | time |
-----|--------|------|
abc  | 0      | 31   |
abc  | 1      | 32   |
abc  | 0      | 33   |
def  | 0      | 34   |
def  | 1      | 35   |
ghi  | 1      | 36   |
ghi  | 1      | 37   |
jkl  | 0      | 38   |
jkl  | 0      | 39   |
jkl  | 0      | 40   |
jkl  | 1      | 41   |
jkl  | 1      | 42   |

and I want to do the following:

  1. Group by name
  2. If ignore is all non-zero in a group, leave the time values as is for this group
  3. If ignore contains at least one zero in a group (e.g. where name is jkl), randomly choose one of the rows in this group where ignore is zero, and apply a function f to the time value.

More specifically, for example if f(x) = x - 30 then I would expect to see something like this:

name | ignore | time |
-----|--------|------|
abc  | 0      | 1    | <- changed
abc  | 1      | 32   |
abc  | 0      | 33   |
def  | 0      | 4    | <- changed
def  | 1      | 35   |
ghi  | 1      | 36   | <- unchanged group
ghi  | 1      | 37   | <- unchanged group
jkl  | 0      | 38   |
jkl  | 0      | 39   |
jkl  | 0      | 10   | <- changed
jkl  | 1      | 41   |
jkl  | 1      | 42   |

I'm finding it hard to get an elegant solution to this. I am not sure how to apply a function to randomly selected rows within a group, nor what the best approach is for only applying a function to selected groups. I would ideally like to solve this via dplyr, but no problem if not.


Solution

  • f <- function(x) x - 30
    df %>% 
      group_by(name) %>% 
      mutate(samp = if(any(ignore == 0)) sample(which(ignore == 0), 1) else F,
             time = ifelse(row_number() != samp, time, f(time))) %>% 
      select(-samp)
    

    output

       name  ignore  time
       <chr>  <dbl> <dbl>
     1 abc        0     1
     2 abc        1    32
     3 abc        0    33
     4 def        0     4
     5 def        1    35
     6 ghi        1    36
     7 ghi        1    37
     8 jkl        0     8
     9 jkl        0    39
    10 jkl        0    40
    11 jkl        1    41
    12 jkl        1    42