Search code examples
requalitytestthat

Testing equality of tibbles with list-columns or nested data.frame


Tibbles (from the tidyverse) can contain list-columns, which is useful to contain e.g. nested dataframes or objects not traditionally found in a data.frame.

Here is an example:

library("dplyr")

nested_df <-
      iris %>%
      group_by(Species) %>%
      tidyr::nest() %>%
      mutate(model = purrr::map(data, lm, formula = Sepal.Length ~ .))

nested_df
#  # A tibble: 3 x 3
#   Species    data              model   
#   <fct>      <list>            <list>  
# 1 setosa     <tibble [50 × 4]> <S3: lm>
# 2 versicolor <tibble [50 × 4]> <S3: lm>
# 3 virginica  <tibble [50 × 4]> <S3: lm>

I am writing some tests with testthat: how do I test equality between such data.frames?

testthat::expect_equal does not work because all.equal and dplyr::all_equal both fail:

all.equal(nested_df, nested_df)
# Error in equal_data_frame(target, current, ignore_col_order = ignore_col_order,  : 
#  Can't join on 'data' x 'data' because of incompatible types (list / list)

I considered using testthat::expect_true(identical(...)), but it is often too strict. For example, defining the exact same nested_df2 is not enough to pass identical because the .Environment attribute of the terms embedded in the lm model is different, although the models are equal and pass all.equal.

identical(nested_df, nested_df2)
# [1] FALSE
identical(nested_df$model, nested_df2$model, ignore.environment = TRUE)
# [1] FALSE
all.equal(nested_df$model, nested_df2$model, tolerance = 0)
# [1] TRUE

How can I test equality of tibbles with list-columns like nested_df?


Solution

  • Kinda blunt approach, but it seems to work on your example:

    all.equal.list(nested_df, nested_df)
    
    # [1] TRUE
    
    all.equal.list(nested_df, mutate(nested_df, Species = sample(Species)))
    
    # [1] "Component “Species”: 2 string mismatches"