Search code examples
rtidyversefiltering

R filter by group to exclude groups that contain a threshold number of falses


Let me know if I need a reproducable set, but the question I have is pretty specific. I have a number of groups that have a true/false column tagged via a case_when() condition set. I need to exclude groups that have more than 50% of elements in that column containing false. I am a little unsure of how to run that filter condition. I assume I have to calculate the length of the group's column, and then count the number of falses and see if it is greater than half the calculated length, I am just unsure of how to actually write that line. Tidyverse solutions preferred, but as long as I can run the filter line I'm good. Thanks.


Solution

  • I believe the following should work, but you will need to test to make sure

    dataset |>
    group_by(grouping_var) |>
    filter(sum(tf_column)/n() < 0.5) |> 
    ungroup() #If you don't want to keep the grouping afterwards