Search code examples
rdataframerowsdata-wrangling

Remove rows of a certain value, before values change in R


I have a data frame like the following:

dat <- data.frame(Target = c(rep("01", times = 8), rep("02", times = 5), 
                             rep("03", times = 4)),
                         targ2clicks = c(1, 1, 1, 1, 0, 0 ,0 , 1, 1, 0, 0, 0, 1,
                                         0, 0, 0, 1))

    Target targ2clicks
1      01           1
2      01           1
3      01           1
4      01           1
5      01           0
6      01           0
7      01           0
8      01           1
9      02           1
10     02           0
11     02           0
12     02           0
13     02           1
14     03           0
15     03           0
16     03           0
17     03           1

Where the first instance for each Target is 1 in the targ2clicks column, I want to remove all rows from the data frame that have 1 in this column before the first occurrence of 0 for that Target. However, where the first value is 0 for a Target, I want to keep all of the values/rows.

What I want to end up with is:

   Target targ2clicks
     01           0
     01           0
     01           0
     01           1
     02           0
     02           0
     02           0
     02           1
     03           0
     03           0
     03           0
     03           1

Where all instances for a Target are 1 with no 0s (not in the example df, but just to consider in any solutions), all rows for that Target should be removed.

I have tried coding this in various different ways with no success! Any help hugely appreciated.


Solution

  • You could use ave() + cumsum():

    dat[with(dat, ave(targ2clicks == 0, Target, FUN = cumsum)) > 0, ]
    
    #    Target targ2clicks
    # 5      01           0
    # 6      01           0
    # 7      01           0
    # 8      01           1
    # 10     02           0
    # 11     02           0
    # 12     02           0
    # 13     02           1
    # 14     03           0
    # 15     03           0
    # 16     03           0
    # 17     03           1
    

    Its dplyr equivalent is

    library(dplyr)
    
    dat %>%
      group_by(Target) %>%
      filter(cumany(targ2clicks == 0)) %>%
      ungroup()