How to filter out rows in one column when condition is met in another column for different groups?
For example:
library(dplyr)
df1 <-tribble(
~group, ~var1, ~var2,
"a", 0, 0,
"a", 1, 0,
"a",1, 0,
"a",0, 1,
"a", 1, 0,
"b", 1, 0,
"b", 0, 1,
"b", 1, 0,
"b", 0, 1)
I want to allow ones in var1
only after having the first 1
in var2
. Therefore, in this example, I would like to get:
group var1 var2
<chr> <dbl> <dbl>
a 0 0
a 0 1
a 1 0
b 0 1
b 1 0
b 0 1
I can identify from where I want to start filtering the data, but don't know exactly how to proceed:
df1 %>%
group_by(var2,group) %>%
mutate(test = case_when(row_number() == 1 & var2 == 1 ~ "exclude_previous_rows",
T ~ "n"))
I'm sure there is a simple way to do this with dplyr, but couldn't find it so far.
We can use a cumulative sum. I think this is what you want:
df1 %>%
group_by(group) %>%
filter(cumsum(var2 == 1) > 0)
# # A tibble: 5 x 3
# # Groups: group [2]
# group var1 var2
# <chr> <dbl> <dbl>
# 1 a 0 1
# 2 a 1 0
# 3 b 0 1
# 4 b 1 0
# 5 b 0 1
This will keep all rows including and after the first 1
in var2
, by group. I'm not really sure what you mean by "I want to allow ones in var1" - your code seems to ignore var1
, and mine follows suit.