Search code examples
rvectordplyrmutatenon-standard-evaluation

Concisely assign vector output of a function to multiple variables in dplyr


I am trying to assign the vector output (i.e. greater than length 1) of a function to multiple columns in a single operation (or at least as concisely as possible).

Take the range() function for example which returns as output a numeric vector of length 2 denoting the minimum and maximum, respectively. Let's say I want to compute the range() per group and assign the output to two columns min and max.

My current approach is combining summarize followed by manually adding a key and then re-shaping to wide format:

library(magrittr)

# create data
df <- dplyr::tibble(group = rep(letters[1:3], each = 3),
                    x = rpois(9, 10))

df
#> # A tibble: 9 x 2
#>   group     x
#>   <chr> <int>
#> 1 a         8
#> 2 a        12
#> 3 a         8
#> 4 b         9
#> 5 b        14
#> 6 b         9
#> 7 c        11
#> 8 c         6
#> 9 c        12

# summarize gives two lines per group
range_df <- df %>% 
  dplyr::group_by(group) %>% 
  dplyr::summarize(range = range(x)) %>% 
  dplyr::ungroup()

range_df
#> # A tibble: 6 x 2
#>   group range
#>   <chr> <int>
#> 1 a         8
#> 2 a        12
#> 3 b         9
#> 4 b        14
#> 5 c         6
#> 6 c        12

# add key and reshape
range_df %>% 
  dplyr::mutate(key = rep(c("min", "max"), 3)) %>% 
  tidyr::pivot_wider(names_from = key, values_from = range)
#> # A tibble: 3 x 3
#>   group   min   max
#>   <chr> <int> <int>
#> 1 a         8    12
#> 2 b         9    14
#> 3 c         6    12

Is there a more elegant / concise alternative to this?

Edit:

Ideally the alternative solution could handle an arbitrary number of outputs (e.g. if the function returns an output with length 3 then 3 variables should be created).


Solution

  • Based on onyambu's answer, I build a small generic function for this. There probably will be some edge cases, where this will not work.

    out2col <- function(x, fun, out_names = c(), add_args = list()) {
        tmp <- do.call(what = fun, args = c(list(x), add_args))
        out <- data.frame(t(tmp))
        if (length(out_names) != 0) {
          if (length(tmp) != length(out_names)) {
            stop("provided names did not match the number of outputs")
          }
          out <- setNames(object = out, nm = out_names)
        } 
        return(out)
    }
    

    Examples without any additional parameters:

    df %>%
      summarise(across(x, out2col, .unpack = TRUE, fun = range),
            .by=group)
    

    Output:

    # A tibble: 3 × 3
      group  x_X1  x_X2
      <chr> <int> <int>
    1 a         7    10
    2 b        11    14
    3 c         9    14
    

    Examples with additional parameters:

    df %>%
      summarise(across(x, out2col, .unpack = TRUE, fun = quantile,
                       out_names = c("min", "max", "Q25"),
                       add_args = list(probs = c(0, 1, 0.25))
                       ),
                .by=group)
    

    Output:

    # A tibble: 3 × 4
      group x_min x_max x_Q25
      <chr> <dbl> <dbl> <dbl>
    1 a         7    10   7.5
    2 b        11    14  11.5
    3 c         9    14  10