Search code examples
rdataframerownacol

Removing both row and column of partial NA value


I have the following dataframe (s):

s<-read.table(text = "V1    V2  V3  V4  V5  V6  V7  V8  V9  V10 
  1 0   62  64  44  NA  55  81  66  57  53  
  2 0   0   65  50  NA  56  79  69  52  55  
  3 0   0   0   57  NA  62  84  76  65  59  
  4 0   0   0   0   NA  30  70  61  41  36  
  5 0   0   0   0   NA  NA  NA  NA  NA  NA  
  6 0   0   0   0   0   0   66  63  51  44  
  7 0   0   0   0   0   0   0   80  72  72  
  8 0   0   0   0   0   0   0   0   68  64  
  9 0   0   0   0   0   0   0   0   0   47  
  10    0   0   0   0   0   0   0   0   0   0   ", header = TRUE)

As can be seen row 5 and column 5 in this case includes only NA and 0 values. I would like to omit them and to keep the order of lines and columns. There might be more column and rows in the same pattern and I would like to do the same. The size of the dataframe might be changed. The final result would be:

    V1  V2  V3  V4  V6  V7  V8  V9  V10 
1   0   62  64  44  55  81  66  57  53  
2   0   0   65  50  56  79  69  52  55  
3   0   0   0   57  62  84  76  65  59  
4   0   0   0   0   30  70  61  41  36  
6   0   0   0   0   0   66  63  51  44  
7   0   0   0   0   0   0   80  72  72  
8   0   0   0   0   0   0   0   68  64  
9   0   0   0   0   0   0   0   0   47  
10  0   0   0   0   0   0   0   0   0   

Is there a way to get the omitted row and column number (in this case 5), as well?


Solution

  • We can try

    v1 <- colSums(is.na(s))
    v2 <- colSums(s==0, na.rm=TRUE)
    j1 <- !(v1>0 & (v1+v2)==nrow(s) & v2 >0)
    
    v3 <- rowSums(is.na(s))
    v4 <- rowSums(s==0, na.rm=TRUE)
    i1 <- !(v3>0 & (v3+v4)==ncol(s) & v3 >0)
    s[i1, j1]
    #   V1 V2 V3 V4 V6 V7 V8 V9 V10
    #1   0 62 64 44 55 81 66 57  53
    #2   0  0 65 50 56 79 69 52  55
    #3   0  0  0 57 62 84 76 65  59
    #4   0  0  0  0 30 70 61 41  36
    #6   0  0  0  0  0 66 63 51  44
    #7   0  0  0  0  0  0 80 72  72
    #8   0  0  0  0  0  0  0 68  64
    #9   0  0  0  0  0  0  0  0  47
    #10  0  0  0  0  0  0  0  0   0
    

    Suppose if we change one of the values in 's'

     s$V7[3] <- NA
    

    By running the above code, the output will be

    #   V1 V2 V3 V4 V6 V7 V8 V9 V10
    #1   0 62 64 44 55 81 66 57  53
    #2   0  0 65 50 56 79 69 52  55
    #3   0  0  0 57 62 NA 76 65  59
    #4   0  0  0  0 30 70 61 41  36
    #6   0  0  0  0  0 66 63 51  44
    #7   0  0  0  0  0  0 80 72  72
    #8   0  0  0  0  0  0  0 68  64
    #9   0  0  0  0  0  0  0  0  47
    #10  0  0  0  0  0  0  0  0   0
    

    NOTE: The OP's condition is includes only NA and 0 values. I would like to omit them