Search code examples
rdrop

Drop columns when there are many missingness in R


I am trying to drop some columns that have less than 5 valid values. Here is an example dataset.

df <- data.frame(id = c(1,2,3,4,5,6,7,8,9,10),
                 i1 = c(0,1,1,1,1,0,0,1,NA,1),
                 i2 = c(1,0,0,1,0,1,1,0,0,NA),
                 i3 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,0),
                 i4 = c(NA,1,NA,NA,NA,NA,NA,NA,1,NA))

> df
   id i1 i2 i3 i4
1   1  0  1 NA NA
2   2  1  0 NA  1
3   3  1  0 NA NA
4   4  1  1 NA NA
5   5  1  0 NA NA
6   6  0  1 NA NA
7   7  0  1 NA NA
8   8  1  0 NA NA
9   9 NA  0 NA 1
10 10  1 NA  0 NA

in this case, columns i3 and i4 needs to be dropped from the data frame.

How can I get the desired dataset below:

> df
   id i1 i2 
1   1  0  1 
2   2  1  0 
3   3  1  0
4   4  1  1 
5   5  1  0 
6   6  0  1 
7   7  0  1 
8   8  1  0 
9   9 NA  0 
10 10  1 NA 

Solution

  • You can keep cols with at least 5 non-missing values with:

    df[colSums(!is.na(df)) >= 5]