I'm building a dataset from very messy raw files, and am using testthat
to make sure things don't break as new data is added or cleaning rules are corrected. I'd like to add a test to see if there are any NA
values in the data, and, if so, to report which columns they are in.
Its trivial to do so manually, by writing a test for each column. But that solution will be a pain to maintain and error-prone as I don't want to have to remember to update the test-NA
file everytime a column is added or removed from the dataset.
Here is example code for what I have
df <- tidyr::tribble(
~A, ~B, ~C,
1, 2, 3,
NA, 2, 3,
1, 2, NA
)
# checks all variables, doesn't report which have NA values
testthat::test_that("NA Values", {
testthat::expect_true(sum(is.na(df)) == 0)
})
# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
testthat::expect_true(sum(is.na(df$A)) == 0)
testthat::expect_true(sum(is.na(df$B)) == 0)
testthat::expect_true(sum(is.na(df$C)) == 0)
})
df <- tidyr::tribble(
~A, ~B, ~C,
1, 2, 3,
NA, 2, 3,
1, 2, NA
)
# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
res <- apply(df, 2, function(x) sum(is.na(x))>0)
testthat::expect_true(all(res), label = paste(paste(which(res), collapse=", "), "contain(s) NA(s)"))
})
which should return
Error: Test failed: 'Variable specific checks'
* 1, 3 contain(s) NA isn't true.
expect_true2 <- function(object, info = NULL, label = NULL) {
act <- testthat::quasi_label(rlang::enquo(object), label, arg = "object")
testthat::expect(identical(as.vector(act$val), TRUE), sprintf("Column %s contain(s) NA(s).",
act$lab), info = info)
invisible(act$val)
}
testthat::test_that("Variable specific checks", {
res <- apply(df, 2, function(x) sum(is.na(x))>0)
expect_true2(all(res), label = paste(which(res), collapse=","))
})
which should return
Error: Test failed: 'Variable specific checks'
* Column 1,3 contain(s) NA(s).