Search code examples
rdplyrpurrrrlangtidyeval

Map over list of dataframes and apply custom mutate-function (purrr, dplyr)


So I have this list:

list(`0` = structure(list(fn = 0L, fp = 34L, tn = 0L, tp = 34L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.1` = structure(list(
    fn = 1L, fp = 26L, tn = 8L, tp = 33L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.2` = structure(list(
    fn = 3L, fp = 22L, tn = 12L, tp = 31L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.3` = structure(list(
    fn = 5L, fp = 7L, tn = 27L, tp = 29L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.4` = structure(list(
    fn = 5L, fp = 3L, tn = 31L, tp = 29L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.5` = structure(list(
    fn = 7L, fp = 1L, tn = 33L, tp = 27L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.6` = structure(list(
    fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.7` = structure(list(
    fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.8` = structure(list(
    fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.9` = structure(list(
    fn = 30L, fp = 0L, tn = 34L, tp = 4L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `1` = structure(list(
    fn = 34L, fp = 0L, tn = 34L, tp = 0L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")))

It is basically a list of length 10 when I applied a quantile-regression model for 10 different quantiles. Each element is a dataframe containing the true/false postive/negative counts. Now I would like to write a function where I can "dynamically" compute the various metrics that one can compute with these counts. So the first element for example looks like this:

> cms[[1]]
# A tibble: 1 x 4
     fn    fp    tn    tp
  <int> <int> <int> <int>
1     0    34     0    34

As it is a list I really wanted to do something with purrr's map or lapply or something similar. I then thought: Well some day I want the True Positive Rate and some day I maybe want the Specificity. Hence, I thought I would write a function, that could take some of the columns as input and do a "classic" dplyr::mutate. But once again I am stuck with my knowledge about tidy evaluation. So I did something like this (and please don't judge it):

fun = function(...){
  f = rlang::enexpr(...)
  return(f)
}

fpr = fun(tp / tp + fn)

# does not work
map(cms, ~mutate(.x, fpr=fpr)) 

# this (non-tidy-eval) works
map(cms, ~mutate(.x, fpr=tp / tp + fn))

I would really like to dynamically pass in columns and compute the result using tidy-evaluation. I thus would appreciate a lot any help or pointer:)


Solution

  • You can also use the following solution.

    • First we have to define a function that takes a data set and a number of arguments. We explicitly use data argument for our data set and capture all the other arguments through ...
    • WE then use enquos function which returns a list of quoted function to defuse the expression we captured through ... and force evaluate it by big bang operator !!! which is normally used for splicing a list of arguments in the context of our data set data through tidy_eval function
    • We then iterate over each element of the list and apply our function on each and every one of them while evaluating our desired expression
    library(rlang)
    
    fn <- function(data, ...) {
      args <- enquos(...)
      
      data %>%
        mutate(out = eval_tidy(!!!args, data = data))
    }
    
    df %>%
      map_dfr(~ .x %>% fn(tp / (tp + fn)))
    
    # A tibble: 11 x 5
          fn    fp    tn    tp   out
       <int> <int> <int> <int> <dbl>
     1     0    34     0    34 1    
     2     1    26     8    33 0.971
     3     3    22    12    31 0.912
     4     5     7    27    29 0.853
     5     5     3    31    29 0.853
     6     7     1    33    27 0.794
     7     8     0    34    26 0.765
     8     8     0    34    26 0.765
     9     8     0    34    26 0.765
    10    30     0    34     4 0.118
    11    34     0    34     0 0