Search code examples
rdataframesubsetna

Remove columns with NA's and/or Zeros Only


I have a sample dataset which looks something similar to the one below:

d= data.frame(a = c(1,5,56,4,9), 
              b = c(0,0,NA,0,NA), 
              c = c(98,67,NA,3,7), 
              d = c(0,0,0,0,0), 
              e = c(NA,NA,NA,NA,NA))

which would be:


| a  |  b |  c | d | e  |
|----|:--:|---:|---|----|
| 1  |  0 | 98 | 0 | NA |
| 5  |  0 | 67 | 0 | NA |
| 56 | NA | NA | 0 | NA |
| 4  | 0  | 3  | 0 | NA |
| 9  | NA | 7  | 0 | NA |

I need to remove all such columns which have:

1. NA's and Zeros
2. Only Zeros
3. Only NA's

So based on the above dataset, columns b,d and e should be eliminated. So, I first need to find out which columns have such conditions and then delete them.

I went through this link Remove the columns with the colsums=0 but I'm not clear with the solution. Also, it doesn't provide me the desired output.

The final output would be:

| a  |  c |
|----|:--:|
| 1  | 98 |
| 5  | 67 |
| 56 | NA |
| 4  | 3  |
| 9  | 7  |

Solution

  • One option would be to create a logical vector with colSums based on the number of NA or 0 elements in each column

    d[!colSums(is.na(d)|d ==0) == nrow(d)]
    #  a    c
    #1  1   98
    #2  5   67
    #3 56   NA
    #4  4    3
    #5  9    7
    

    Or another option is to replace all the 0s to NA and then apply is.na

    d[colSums(!is.na(replace(d, d == 0, NA))) > 0]
    

    Or more compactly with na_if

    d[colSums(!is.na(na_if(d, 0))) > 0]