Search code examples
rdataframefrequency

Removing infrequent rows in a data frame


Let's say I have a following very simple data frame:

a <- rep(5,30)
b <- rep(4,80)
d <- rep(7,55)

df <- data.frame(Column = c(a,b,d))

What would be the most generic way for removing all rows with the value that appear less then 60 times?

I know you could say "in this case it's just a", but in my real data there are many more frequencies, so I wouldn't want to specify them one by one.

I was thinking of writing a loop such that if length() of an 'i' is smaller than 60, these rows will be deleted, but perhaps you have other ideas. Thanks in advance.


Solution

  • A solution using dplyr.

    library(dplyr)
    
    df2 <- df %>%
      group_by(Column) %>%
      filter(n() >= 60)
    

    Or a solution from base R

    uniqueID <- unique(df$Column)
    targetID <- sapply(split(df, df$Column), function(x) nrow(x) >= 60)
    
    df2 <- df[df$Column %in% uniqueID[targetID], , drop = FALSE]