Search code examples
rdata-manipulationmissing-data

Selecting Rows with Missing Data in a Range of Columns


There are several ways to identify and manipulate individual cells with missing data in R, e.g., with complete.cases or even rowSums.

However, I've not been able to find---or figure out myself---an expedient way to select rows that have missing data within a subsetted range of columns.

For example, in dataframe df:

df <- data.frame(D1 = c('A', 'B', 'C', 'D'),
                 D2 = c(NA, 0, 1, 1),
                 V1 = c(11, NA, 33, NA),
                 V2 = c(111, 222, NA, NA)
                 )
df

# D1  D2  V1  V2
#  A  NA  11 111    
#  B   0  NA 222    
#  C   1  33  NA    
#  D   1  NA  NA    

I would like to select all rows that have missing data in both columns V1 and V2, thus selecting row D but not rows B or C (or A).

I have a larger range of columns than given in that toy example, so selecting a set of columns with, e.g., && could make for a long command.

N.B., a similar SO question addresses selecting rows where none are NAs.


Solution

  • You can use dplyr::if_all. You can select the columns very flexibly with tidyselect, for instance using :, c, starts_with...

    library(dplyr)
    df %>% 
      filter(if_all(V1:V2, is.na))
    
    #  D1 D2 V1 V2
    #1  D  1 NA NA
    

    Also works (this shows the flexibility of tidyselect):

    filter(df, if_all(3:4, is.na))
    filter(df, if_all(starts_with("V"), is.na))
    filter(df, if_all(c(V1, V2), is.na))
    filter(df, if_all((last_col()-1):last_col(), is.na))
    filter(df, if_all(num_range("V", 1:2), is.na))