Search code examples
rrowapplyfrequency

Delete rows with value frequencies lesser than x in R


I got a data frame in R like the following:

V1 V2 V3
1  2  3
1  43 54
2  34 53
3  34 51
3  43 42
...

And I want to delete all rows which value of V1 has a frequency lower then 2. So in my example the row with V1 = 2 should be deleted, because the value "2" only appears once in the column ("1" and "3" appear twice each).

I tired to add a extra column with the frequency of V1 in it to delete all rows where the frequency is > 1 but with the following I only get NAs in the extra column.

data$Frequency <- table(data$V1)[data$V1]

Thanks


Solution

  • You can also consider using data.table. We first count the occurence of each value in V1, then filter on those occurences being more than 1. Finally, we remove our count-column as we no longer need it.

    library(data.table)
    
    setDT(dat)
    dat2 <- dat[,n:=.N,V1][n>1,,][,n:=NULL]
    

    Or even quicker, thanks to RichardScriven:

    dat[, .I[.N >= 2], by = V1]
    > dat2
       V1 V2 V3
    1:  1  2  3
    2:  1 43 54
    3:  3 34 51
    4:  3 43 42