Search code examples
rdplyrtidyrmagrittr

R: Return rows with only 1 non-NA value for a set of columns


Suppose I have a data.table with the following data:

colA  colB  colC  result
1     2     3     231
1     NA    2     123
NA    3     NA    345
11    NA    NA    754

How would I use dplyr and magrittr to only select the following rows:

colA  colB  colC result
NA    3     NA   345
11    NA    NA   754

The selection criteria is: only 1 non-NA value for columns A-C (i.e. colA, colB, ColC)

I have been unable to find a similar question; guessing this is an odd situation.


Solution

  • A base R option would be

    df[apply(df, 1, function(x) sum(!is.na(x)) == 1), ]
    #  colA colB colC
    #3   NA    3   NA
    #4   11   NA   NA
    

    A dplyr option is

    df %>% filter(rowSums(!is.na(.)) == 1)
    

    Update

    In response to your comment, you can do

    df[apply(df[, -ncol(df)], 1, function(x) sum(!is.na(x)) == 1), ]
    #  colA colB colC result
    #3   NA    3   NA    345
    #4   11   NA   NA    754
    

    Or the same in dplyr

    df %>% filter(rowSums(!is.na(.[-length(.)])) == 1)
    

    This assumes that the last column is the one you'd like to ignore.


    Sample data

    df <-read.table(text = "colA  colB  colC
    1     2     3
    1     NA    2
    NA    3     NA
    11    NA    NA", header = T)
    

    Sample data for update

    df <- read.table(text =
    "colA  colB  colC  result
    1     2     3     231
    1     NA    2     123
    NA    3     NA    345
    11    NA    NA    754
    ", header = T)