Search code examples
rtidyeval

Tidyeval: apply function to data frames extracted from list


This is a simplified version of a problem involving a large list containing complex tables. I want to extract the tables from the list and apply a function to each one. Here we can create a simple list containing small named data frames:

library(tidyverse)

table_names <- c('dfA', 'dfB', 'dfC')

dfA <- tibble(a = 1:3, b = 4:6, c = 7:9)
dfB <- tibble(a = 10:12, b = 13:15, c = 16:18)
dfC <- tibble(a = 19:21, b = 22:24, c = 25:27)

df_list <- list(dfA, dfB, dfC) %>% setNames(table_names)

Here is a simplified example of the kind of operation I would like to apply:

dfA_mod <- df_list$dfA %>% 
  mutate(name = 'dfA') %>%
  select(name, everything()) 

In this example, I extract one of three tables in the list df_list$dfA, create a new column with the same value in each row mutate(name = 'dfA'), and re-order the columns so that the new column appears in the left-most position select(name, everything()). The resulting object is assigned to dfA_mod.

To solve the larger problem, I want to use one of the purrr::map() variants to apply the function over the character vector table_names, which was initiated in the first block of code above. The elements of table_names serve two purposes: 1) naming the tables held in the list; and 2) supplying values for the name column in the modified table.

I could write a function such as:

fun <- function(x) {
df_list$x %>% 
  mutate(name = x) %>%
  select(name, everything()) %>%
  assign(paste0(x, '_mod'), ., envir = .GlobalEnv)
}

And then use map() to create a new list of modified tables:

new_list <- df_list %>% map(table_name, fun(x))

But of course this code does not work, with the main obstacle being (for me at least) figuring out how to quote and unquote the right terms within the function. I'm a beginner at tidy evaluation, and I could use some help in specifying the function and using map properly.

Here is the desired output (for one modified table):

# A tibble: 3 x 4
  name      a     b     c
  <chr> <int> <int> <int>
1 dfA       1     4     7
2 dfA       2     5     8
3 dfA       3     6     9

Thanks in advance for any help!


Solution

  • We can use purrr::imap which passes data in the list as well as name of the list

    library(dplyr)
    library(purrr)
    
    df_out <- imap(df_list, ~.x %>% mutate(name = .y) %>% select(name, everything()))
    df_out
    
    #$dfA
    # A tibble: 3 x 4
    #  name      a     b     c
    #  <chr> <int> <int> <int>
    #1 dfA       1     4     7
    #2 dfA       2     5     8
    #3 dfA       3     6     9
    
    #$dfB
    # A tibble: 3 x 4
    #  name      a     b     c
    #  <chr> <int> <int> <int>
    #1 dfB      10    13    16
    #....
    #....
    

    This gives a list of desired dataframes, if you want them as separate dataframes, you can do

    names(df_out) <- paste0(names(df_out), "_mod")
    list2env(df_out, .GlobalEnv)
    

    We can also do it using base R Map

    df_out <- Map(function(x, y) transform(x, name = y)[c('name', names(x))], 
                                   df_list, names(df_list))
    

    and give list names same as above.