Search code examples
rdataframesubsetna

Delete columns that contain more than one NA in a data frame in R


I have the following data frame:

data <- data.frame(
  ID = c("Per1", "Per2", "Per3"),
  Col1 = c(1, 2, NA),
  Col2 = c(2, NA, NA),
  Col3 = c(3, NA, 5),
  Col4 = c(4, NA, NA)
)

    ID Col1 Col2 Col3 Col4
1 Per1    1    2    3    4
2 Per2    2   NA   NA   NA
3 Per3   NA   NA    5   NA

Now I want to only delete the columns that contain more than one NA, so column 2 and 4, while keeping the columns that only contain one NA (1&3)

Thanks


Solution

  • You could try something like this in base R.

    Using the condition below you can return a vector of T/F to select only the columns that have less than 2 NAs.

    data[, colSums(is.na(data)) < 2]
    
        ID Col1 Col3
    1 Per1    1    3
    2 Per2    2   NA
    3 Per3   NA    5