I have a sample dataset which looks something similar to the one below:
d= data.frame(a = c(1,5,56,4,9),
b = c(0,0,NA,0,NA),
c = c(98,67,NA,3,7),
d = c(0,0,0,0,0),
e = c(NA,NA,NA,NA,NA))
which would be:
| a | b | c | d | e |
|----|:--:|---:|---|----|
| 1 | 0 | 98 | 0 | NA |
| 5 | 0 | 67 | 0 | NA |
| 56 | NA | NA | 0 | NA |
| 4 | 0 | 3 | 0 | NA |
| 9 | NA | 7 | 0 | NA |
I need to remove all such columns which have:
1. NA's and Zeros
2. Only Zeros
3. Only NA's
So based on the above dataset, columns b,d and e should be eliminated. So, I first need to find out which columns have such conditions and then delete them.
I went through this link Remove the columns with the colsums=0 but I'm not clear with the solution. Also, it doesn't provide me the desired output.
The final output would be:
| a | c |
|----|:--:|
| 1 | 98 |
| 5 | 67 |
| 56 | NA |
| 4 | 3 |
| 9 | 7 |
One option would be to create a logical vector with colSums
based on the number of NA
or 0 elements in each column
d[!colSums(is.na(d)|d ==0) == nrow(d)]
# a c
#1 1 98
#2 5 67
#3 56 NA
#4 4 3
#5 9 7
Or another option is to replace
all the 0s to NA
and then apply is.na
d[colSums(!is.na(replace(d, d == 0, NA))) > 0]
Or more compactly with na_if
d[colSums(!is.na(na_if(d, 0))) > 0]