Lets say this is my df :
people <- c(1,1,1,2,2,3,3,4,4,5,5)
activity <- c(1,1,1,2,2,3,4,5,5,6,6)
completion <- c(0,0,1,0,1,1,1,0,0,0,1)
And I would like to remove all people that never completed any activity.
I have tried this code, but somehow it does not work. I have no idea what could be wrong here.
nevercompleted<- df %>%
filter(completion != 0) %>%
group_by(people) %>%
summarise("frequency activity" = n())
df<- -c (df$nevercompleted)
So, in this scenario person 4 should be removed from the df. Note that I am only intrested in removing those that never completed anything like person 4, not person 1 who at one point completes an activity.
In base R, the following can easily be rewritten as a one-liner.
i <- ave(as.logical(df$completion), df$people, FUN = function(x) any(x != 0, na.rm = TRUE))
df <- df[which(i), ]
df
# people activity completion
#1 1 1 0
#2 1 1 0
#3 1 1 1
#4 2 2 0
#5 2 2 1
#6 3 3 1
#7 3 4 1
#10 5 6 0
#11 5 6 1
dplyr
And here is a dplyr
way.
First filter only people that have completed an activity, then join with the original data set in order to get all columns.
df <- df %>%
group_by(people) %>%
summarise(completion = any(as.logical(completion))) %>%
filter(completion) %>%
select(-completion) %>%
left_join(df, by = 'people')
df
#`summarise()` ungrouping output (override with `.groups` argument)
## A tibble: 9 x 3
# people activity completion
# <dbl> <dbl> <dbl>
#1 1 1 0
#2 1 1 0
#3 1 1 1
#4 2 2 0
#5 2 2 1
#6 3 3 1
#7 3 4 1
#8 5 6 0
#9 5 6 1
Data
In the question there is no data.frame
instruction, only the creation of the column vectors.
people <- c(1,1,1,2,2,3,3,4,4,5,5)
activity <- c(1,1,1,2,2,3,4,5,5,6,6)
completion <- c(0,0,1,0,1,1,1,0,0,0,1)
df <- data.frame(people, activity, completion)