Search code examples
rdplyrpurrrtidyselect

R - Calculation on nested tibbles of the same size


I wish to calculate how each set of data in a tibble is different to a baseline dataset.

to plan, I wrote this R code to subtract a tibble from another of the same size:

# this works
tbl_a <- tibble(a1 = 1, a2 = 2, a3 = 3)
tbl_b <- tibble(a1 = 4, a2 = 5, a3 = 6)
tbl_a + tbl_b

# > tbl_a + tbl_b
#   a1 a2 a3
# 1  5  7  9

Now I'm turning it into a set of tibbles

# compare multiple datasets to baseline of same shape
tbl_a1 <- tibble(id = "i", a1 = 1, a2 = 2, a3 = 3)
tbl_a2 <- tibble(id = "ii", a1 = 2, a2 = 3, a3 = 4)
tbl_a3 <- tibble(id = "iii", a1 = 3, a2 = 4, a3 = 5)
tbl_base <- tibble(id_base = "baseline", a1 = 4, a2 = 5, a3 = 6)
tbls <- bind_rows(tbl_a1, tbl_a2, tbl_a3)

tbls_compare <- tbls %>%
  nest(set = starts_with("a")) %>%
  bind_cols(tbl_base) %>%
  nest(set_baseline = starts_with("a"))
# id    set              id_base  set_baseline    
# <chr> <list>           <chr>    <list>          
#   1 i     <tibble [1 × 3]> baseline <tibble [1 × 3]>
#   2 ii    <tibble [1 × 3]> baseline <tibble [1 × 3]>
#   3 iii   <tibble [1 × 3]> baseline <tibble [1 × 3]>

I expected to be able to perform subtraction like I could in tbl_a + tbl_b example.

However, I'm greeted with an error:

> tbls_compare %>%
+   mutate(diff_to_base = set_baseline - set)
Error in `mutate()`:
! Problem while computing `diff_to_base = set_baseline - set`.
Caused by error in `set_baseline - set`:
! non-numeric argument to binary operator
Run `rlang::last_error()` to see where the error occurred.

I tried using purrr:map but could not by myself work out a solution.

Could somebody please enlighten me?


Solution

  • I think you need to add rowwise, otherwise set_baseline - set will attempt to subtract a list of tibbles from a list of tibbles.

    tbls_compare <- tbls %>%
      nest(set = starts_with("a")) %>%
      bind_cols(tbl_base) |>
      nest(set_baseline = starts_with("a")) |>
      rowwise() |>
      mutate(diff_to_base = list(as_tibble(set_baseline - set)))
    
    tbls_compare
    
    # A tibble: 3 × 5
    # Rowwise: 
      id    set              id_base  set_baseline     diff_to_base    
      <chr> <list>           <chr>    <list>           <list>          
    1 i     <tibble [1 × 3]> baseline <tibble [1 × 3]> <tibble [1 × 3]>
    2 ii    <tibble [1 × 3]> baseline <tibble [1 × 3]> <tibble [1 × 3]>
    3 iii   <tibble [1 × 3]> baseline <tibble [1 × 3]> <tibble [1 × 3]>
    
    

    This is how it looks when unnested:

    tbls_compare |>
      unnest(cols = diff_to_base)
    
    + # A tibble: 3 × 7
      id    set              id_base  set_baseline        a1    a2    a3
      <chr> <list>           <chr>    <list>           <dbl> <dbl> <dbl>
    1 i     <tibble [1 × 3]> baseline <tibble [1 × 3]>     3     3     3
    2 ii    <tibble [1 × 3]> baseline <tibble [1 × 3]>     2     2     2
    3 iii   <tibble [1 × 3]> baseline <tibble [1 × 3]>     1     1     1