Tibbles (from the tidyverse
) can contain list-columns, which is useful to contain e.g. nested dataframes or objects not traditionally found in a data.frame.
Here is an example:
library("dplyr")
nested_df <-
iris %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(model = purrr::map(data, lm, formula = Sepal.Length ~ .))
nested_df
# # A tibble: 3 x 3
# Species data model
# <fct> <list> <list>
# 1 setosa <tibble [50 × 4]> <S3: lm>
# 2 versicolor <tibble [50 × 4]> <S3: lm>
# 3 virginica <tibble [50 × 4]> <S3: lm>
I am writing some tests with testthat
: how do I test equality between such data.frames?
testthat::expect_equal
does not work because all.equal
and dplyr::all_equal
both fail:
all.equal(nested_df, nested_df)
# Error in equal_data_frame(target, current, ignore_col_order = ignore_col_order, :
# Can't join on 'data' x 'data' because of incompatible types (list / list)
I considered using testthat::expect_true(identical(...))
, but it is often too strict. For example, defining the exact same nested_df2
is not enough to pass identical
because the .Environment
attribute of the terms
embedded in the lm
model is different, although the models are equal and pass all.equal
.
identical(nested_df, nested_df2)
# [1] FALSE
identical(nested_df$model, nested_df2$model, ignore.environment = TRUE)
# [1] FALSE
all.equal(nested_df$model, nested_df2$model, tolerance = 0)
# [1] TRUE
How can I test equality of tibbles with list-columns like nested_df
?
Kinda blunt approach, but it seems to work on your example:
all.equal.list(nested_df, nested_df)
# [1] TRUE
all.equal.list(nested_df, mutate(nested_df, Species = sample(Species)))
# [1] "Component “Species”: 2 string mismatches"