Search code examples
rfiltersubset

Removing rows based on the values in multiple columns in R


In my data, I need to remove the rows if the values in three columns (V1, V2, V3) are either a combination of 12 and NAs (like row 2) or all three of them equal 12 (like row 5). Please note that if all values equal NA (like row 3) it should remain in the data.

df <- data.frame(
  "V1" = c(NA, NA, NA, 12, 12),
  "V2" = c(55, NA, NA, 14, 12),
  "V3" = c(21, 12, NA, NA, 12),
  "V4" = c(NA, 32, NA, NA, NA),
  "V5" = c(NA, NA, 18, NA, NA)
)
     V1 V2 V3 V4 V5 
1    NA 55 21 NA NA
2    NA NA 12 32 NA
3    NA NA NA NA 18
4    12 14 NA NA NA
5    12 12 12 NA NA

I would like the following result:

     V1 V2 V3 V4 V5 
1    NA 55 21 NA NA
3    NA NA NA NA 18
4    12 14 NA NA NA

Thanks in advance for your help.


Solution

  • First set a col variable storing the target column names. The total number of records being NA or 12 should match the length of col.

    col <- c("V1", "V2", "V3")
    
    df[apply(df[, col], 1, \(x) sum((is.na(x) | x == 12), na.rm = T) != length(col)), ]
    

    Or

    df[rowSums(is.na(df[, col]) | df[, col] == 12, na.rm = TRUE) < length(col), ]
    

    Update: To remove rows that either include both 12 and NA or all of the values equal 12, use the following code:

    df[apply(df[, col], 1, \(x) !((sum((is.na(x) | x == 12), na.rm = T) == length(col)) & 
                                   (sum(is.na(x)) >= 1 & sum(x == 12, na.rm = T) >= 1) |
                                    sum(x == 12, na.rm = T) == length(col))), ]
    

    Output

      V1 V2 V3 V4 V5
    1 NA 55 21 NA NA
    3 76 NA NA NA 12
    4 12 14 NA NA NA