I am trying to drop some columns that have less than 5 valid values. Here is an example dataset.
df <- data.frame(id = c(1,2,3,4,5,6,7,8,9,10),
i1 = c(0,1,1,1,1,0,0,1,NA,1),
i2 = c(1,0,0,1,0,1,1,0,0,NA),
i3 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,0),
i4 = c(NA,1,NA,NA,NA,NA,NA,NA,1,NA))
> df
id i1 i2 i3 i4
1 1 0 1 NA NA
2 2 1 0 NA 1
3 3 1 0 NA NA
4 4 1 1 NA NA
5 5 1 0 NA NA
6 6 0 1 NA NA
7 7 0 1 NA NA
8 8 1 0 NA NA
9 9 NA 0 NA 1
10 10 1 NA 0 NA
in this case, columns i3
and i4
needs to be dropped from the data frame.
How can I get the desired dataset below:
> df
id i1 i2
1 1 0 1
2 2 1 0
3 3 1 0
4 4 1 1
5 5 1 0
6 6 0 1
7 7 0 1
8 8 1 0
9 9 NA 0
10 10 1 NA
You can keep cols with at least 5 non-missing values with:
df[colSums(!is.na(df)) >= 5]