Search code examples
rmissing-data

I want to delete the IDs that have no information in the remaining columns


Here is a representation of my dataset:

Number<-c(1:10)
AA<-c(head(LETTERS,4), rep(NA,6))
BB<-c(head(letters,6), rep(NA,4))
CC<-c(1:6, rep(NA,4))
DD<-c(10:14, rep(NA,5))
EE<-c(3:8, rep(NA,4))
FF<-c(6:1, rep(NA,4))
mydata<-data.frame(Number,AA,BB,CC,DD,EE,FF)

I want to delete all the IDs (Number) that have no information in the remaining columns, automatically. I want to tell the function that if there is a value in Number but there is only NA in all the remaining columns, delete the row. I must have the dataframe below:

   Number   AA   BB CC DD EE FF
1       1    A    a  1 10  3  6
2       2    B    b  2 11  4  5
3       3    C    c  3 12  5  4
4       4    D    d  4 13  6  3
5       5 <NA>    e  5 14  7  2
6       6 <NA>    f  6 NA  8  1

Solution

  • We can use if_all/if_all

    library(dplyr)
    mydata %>% 
       filter(if_any(-Number, complete.cases))
    

    -output

      Number   AA BB CC DD EE FF
    1      1    A  a  1 10  3  6
    2      2    B  b  2 11  4  5
    3      3    C  c  3 12  5  4
    4      4    D  d  4 13  6  3
    5      5 <NA>  e  5 14  7  2
    6      6 <NA>  f  6 NA  8  1
    

    or

    mydata %>%
       filter(!if_all(-Number, is.na))
    

    Or with base R

     subset(mydata, rowSums(!is.na(mydata[-1])) >0 )
      Number   AA BB CC DD EE FF
    1      1    A  a  1 10  3  6
    2      2    B  b  2 11  4  5
    3      3    C  c  3 12  5  4
    4      4    D  d  4 13  6  3
    5      5 <NA>  e  5 14  7  2
    6      6 <NA>  f  6 NA  8  1