Search code examples
rdataframenaimputation

R - Select all rows that have one NA value at most?


I'm trying to impute my data and keep as many observations as I can. I want to select observations that have 1 NA value at most from the data found at: mlbench::data(PimaIndiansDiabetes2).

For example:

Var1 Var2 Var3
1      NA   NA
2      34   NA
3      NA   NA
4      NA   55
5      NA   NA
6      40   28

What I would like returned:

Var1 Var2 Var3
2      34   NA
4      NA   55
6      40   28

This code returns rows with NA values and I know that I could join all observations with 1 NA value using merge() to observations without NA values. I'm not sure how to do extract those though.

na_rows <- df[!complete.cases(df), ]

Solution

  • A base R solution:

    df[rowSums(is.na(df)) <= 1, ]
    

    Its dplyr equivalent:

    library(dplyr)
    
    df %>%
      filter(rowSums(is.na(pick(everything()))) <= 1)