Search code examples
rtidyverseassertr

Run an assert() check on a subset of the data in-line without modifying output data.frame


I want to exempt a few rows from a check with assertr::assert() without modifying the data.frame that's passed through.

For instance say I want to assert that there are no duplicated values of mtcars$qsec where mtcars$am is 0. I want to exempt the values where am = 1 and get back all of mtcars.

This fails as it should:

library(assertr)
mtcars %>%
  assert(is_uniq, qsec)

And this works but passes through the filtered data.frame:

mtcars %>% 
  filter(am == 0) %>% 
  assert(is_uniq, qsec)

What I want is this, where it would succeed and pass all the data through if there are no duplicated values of qsec where am == 0, and to throw an error if there are:

mtcars %>% 
  assert(filter(., am == 0), is_uniq, qsec)

But that doesn't work. Is there a way I can check a subset of the data in a pipeline while still getting the whole data set out at the end?


Solution

  • You can use a lambda expression, as documented in ?magrittr::`%>%`:

    mtcars0 <- mtcars %>% { {assert(filter(., am == 1), is_uniq, qsec); .} }
    identical(mtcars0, mtcars)
    ## [1] TRUE
    

    Perhaps a more transparent example:

    d <- data.frame(g = rep(1:2, each = 3), x = c(1, 2, 3, rep(4, 3)))
    ##   g x
    ## 1 1 1
    ## 2 1 2
    ## 3 1 3
    ## 4 2 4
    ## 5 2 4
    ## 6 2 4
    
    d0 <- d %>% { {assert(filter(., g == 1), is_uniq, x); .} }
    identical(d0, d)
    ## [1] TRUE
    
    d %>% { {assert(filter(., g == 2), is_uniq, x); .} }
    ## Column 'x' violates assertion 'is_uniq' 3 times
    ##     verb redux_fn predicate column index value
    ## 1 assert       NA   is_uniq      x     1     4
    ## 2 assert       NA   is_uniq      x     2     4
    ## 3 assert       NA   is_uniq      x     3     4
    ##
    ## Error: assertr stopped execution