Search code examples
rtestthat

Using testthat to check each variable in a data frame for NA values


I'm building a dataset from very messy raw files, and am using testthat to make sure things don't break as new data is added or cleaning rules are corrected. I'd like to add a test to see if there are any NA values in the data, and, if so, to report which columns they are in.

Its trivial to do so manually, by writing a test for each column. But that solution will be a pain to maintain and error-prone as I don't want to have to remember to update the test-NA file everytime a column is added or removed from the dataset.

Here is example code for what I have

df <- tidyr::tribble(
  ~A, ~B, ~C, 
  1, 2, 3,
  NA, 2, 3, 
  1, 2, NA
)

# checks all variables, doesn't report which have NA values
testthat::test_that("NA Values", {
  testthat::expect_true(sum(is.na(df)) == 0)
})

# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
  testthat::expect_true(sum(is.na(df$A)) == 0)
  testthat::expect_true(sum(is.na(df$B)) == 0)
  testthat::expect_true(sum(is.na(df$C)) == 0)
})

Solution

  • Solution 1: quick and (not so) dirty

    df <- tidyr::tribble(
      ~A, ~B, ~C, 
      1, 2, 3,
      NA, 2, 3, 
      1, 2, NA
    )
    
    # Checks each column, but is a pain to maintain
    testthat::test_that("Variable specific checks", {
        res <- apply(df, 2, function(x) sum(is.na(x))>0)
        testthat::expect_true(all(res), label = paste(paste(which(res), collapse=", "), "contain(s) NA(s)"))
    })
    

    which should return

    Error: Test failed: 'Variable specific checks'
    * 1, 3 contain(s) NA isn't true.
    

    Solution 2: tailor an expect_() function to your needs

    expect_true2 <- function(object, info = NULL, label = NULL) {
            act <- testthat::quasi_label(rlang::enquo(object), label, arg = "object")
            testthat::expect(identical(as.vector(act$val), TRUE), sprintf("Column %s contain(s) NA(s).", 
                act$lab), info = info)
            invisible(act$val)
        }
    testthat::test_that("Variable specific checks", {
        res <- apply(df, 2, function(x) sum(is.na(x))>0)
        expect_true2(all(res), label = paste(which(res), collapse=","))
    })
    

    which should return

    Error: Test failed: 'Variable specific checks'
    * Column 1,3 contain(s) NA(s).