I would like to remove certain people from my dataset if a condition is fulfilled. I have panel data and ideally would like to count the number of completions for every person and delete them from my dataset if a person has never completed anything.
people <- c(1,1,1,2,2,3,3,4,4,5,5)
activity <- c(1,1,1,2,2,3,4,5,5,6,6)
completion <- c(0,0,1,0,1,1,1,0,0,0,1)
for completion 0 indicates no completion and 1 indicates completion.
So, in this case i need to detect that person number 4 has never completed activity 5, and therefore will be removed from the dataset completely all rows. However, this only gives me an idea about activitys never completed, even though some activitys eventually will be completed. Then i would remove them like that. I have tried running the ifelse condition:
df$nevercompleted <- ifelse(df$completion == 0)
df<-subset(df,completion == 0)
A dplyr
solution.
## Create the dataframe
df <- tibble(
people = c(1,1,1,2,2,3,3,4,4,5,5),
activity = c(1,1,1,2,2,3,4,5,5,6,6),
completion = c(0,0,1,0,1,1,1,0,0,0,1))
df %>%
## Group observations by people
group_by(people) %>%
## Create total completions per individual
mutate(tot_completion = sum(completion)) %>%
## Keep only people with strictly positive number of completions
filter(tot_completion > 0)