Considering this sample
df<-{data.frame(v0=c(1, 2, 5, 1, 2, 0, 1, 2, 2, 2, 5),v1=c('a', 'a', 'a', 'b', 'b', 'c', 'c', 'b', 'b', 'a', 'a'), v2=c(0, 10, 5, 1, 8, 5,10, 3, 3, 1, 5))}
For a large dataframe: if v0>4, drop all the rows containing corresponding value v1 (drop a group?).
So, here the result should be a dataframe dropping all the rows with "a" since v0 values of 5 exist for "a".
df_ExpectedResult<-{data.frame(v0=c( 1, 2, 0, 1, 2, 2 ),v1=c( 'b', 'b', 'c', 'c', 'b', 'b'), v2=c(1, 8, 5,10, 3, 3))}
Also, I would like to have a new dataframe keeping the dropped groups.
df_Dropped <- {data.frame(v1='a')}
How would you do this efficiently for a huge dataset? I am using a simple for loop and if statement, but it takes too long to do the manipulation.
An option with dplyr
library(dplyr)
df %>%
group_by(v1) %>%
filter(sum(v0 > 4) < 1) %>%
ungroup
-output
# A tibble: 6 x 3
# v0 v1 v2
# <dbl> <chr> <dbl>
#1 1 b 1
#2 2 b 8
#3 0 c 5
#4 1 c 10
#5 2 b 3
#6 2 b 3