Search code examples
rdataframedata-manipulation

how to remove rows in a dataframe that contains all zeros or NAs or in combination of zeros and NAs in R


I have a large data frame with 10000000 rows and 150 columns. in the dataset, there are specific rows that contain all zeros or all NAs or a combination of zeros and NAs. the sample dataframe is shown below

df <- data.frame(x = c('q', 'w', 'e', 'r','t', 'y'), a = c('a','b','c','d','e','f'), b = 
c(0,1,2,3,0,5), c= c(0,3,2,4,0,'NA'), d=c(0,2,5,7,'NA',5), e = c(0,5,'NA',3,0,'NA'), f = 
c(0,7,4,3,'NA',7))

the desired output is as follows

df1 <- data.frame(x = c('w', 'e', 'r','y'), a = c('b','c','d','f'), b = c(1,2,3,5), c= 
c(3,2,4,'NA'), d=c(2,5,7,5), e = c(5,'NA',3,'NA'), f = c(7,4,3,7)) 

i.e.

df <- 
w b 1 3  2 5  7
e c 2 2  5 NA 4
r d 3 4  7 3  3
y f 5 NA 5 NA 7

I tried multiple possible solutions in the stackover flow such as

df %>% 
  filter(if_all(everything(), ~ !is.na(.x))) 

or

df %>%

filter_if(is.numeric, ~ !is.na(.))

but could not solve the problem


Solution

  • You can use apply() rowwise, combining all() and na.omit()

    df[apply(df[,-c(1,2)],1,\(r) all(na.omit(r)!=0)),]
    

    Output:

     x a b  c d  e f
    2 w b 1  3 2  5 7
    3 e c 2  2 5 NA 4
    4 r d 3  4 7  3 3
    6 y f 5 NA 5 NA 7