Search code examples
rdplyrnamagrittr

Filter for complete cases in data.frame using dplyr (case-wise deletion)


Is it possible to filter a data.frame for complete cases using dplyr? complete.cases with a list of all variables works, of course. But that is a) verbose when there are a lot of variables and b) impossible when the variable names are not known (e.g. in a function that processes any data.frame).

library(dplyr)
df = data.frame(
    x1 = c(1,2,3,NA),
    x2 = c(1,2,NA,5)
)

df %.%
  filter(complete.cases(x1,x2))

Solution

  • Try this:

    df %>% na.omit
    

    or this:

    df %>% filter(complete.cases(.))
    

    or this:

    library(tidyr)
    df %>% drop_na
    

    If you want to filter based on one variable's missingness, use a conditional:

    df %>% filter(!is.na(x1))
    

    or

    df %>% drop_na(x1)
    

    Other answers indicate that of the solutions above na.omit is much slower but that has to be balanced against the fact that it returns row indices of the omitted rows in the na.action attribute whereas the other solutions above do not.

    str(df %>% na.omit)
    ## 'data.frame':   2 obs. of  2 variables:
    ##  $ x1: num  1 2
    ##  $ x2: num  1 2
    ##  - attr(*, "na.action")= 'omit' Named int  3 4
    ##    ..- attr(*, "names")= chr  "3" "4"
    

    ADDED Have updated to reflect latest version of dplyr and comments.

    ADDED Have updated to reflect latest version of tidyr and comments.