Search code examples
rpurrr

Using map with a custom function (that returns a data frame) and multiple inputs


I have a very simple function, that returns a data frame. The function takes three parameters, a dataset, and two variables that are present in the dataframe.

I was hoping to use the map/pmap family of functions to feed a vector/list of inputs and produce a single (long) output dataset. I can't seem to get map/pmap tools to work for me. What can I try next?

The function is pretty basic:

library(dplyr)

# Function takes dataset and 2 categorical variables 
# Calculates number of records for each combination of values in the two variables
# Calculates % of var 1 responses for each level of var2
ugh<-function(data, var1, var2){
# add checks to make sure vars on dataset

  tab_n<-data%>%
    group_by_at(c(var1, var2))%>%
    summarise(Numerator=n(), .groups="drop")%>%
    group_by_at(c(var2))%>%
    mutate(Denominator=sum(Numerator)
           ,Pct=Numerator/Denominator*100
# storing names of var1 and var2 for future subsetting
           , Var1=var1 
           , Var2=var2)%>% 
    rename(Var1_levels=var1
           , Var2_levels=var2
           )
}


# Sample output 
combo1<-mtcars%>%ugh(var1="cyl", var2="gear")
# can also run this as: 
# combo1<-ugh(data=mtcars, var1="cyl", var2="gear")
combo2<-mtcars%>%ugh(var1="cyl", var2="carb")
sampleOutput<-rbind(combo1, combo2)

# Trying to use map to generate sampleOutput
var1_vector=rep("cyl", 2) 
var2_vector=c("gear", "carb")

plswork<-mtcars%>%
  map2_dfr(var1=var1_vector, var2=var2_vector, ugh)

The error message I get is:

Error in as_mapper(.f, ...) : argument ".f" is missing, with no default

I've tried using ~ to specify the function and I've tried using map2 and binding rows separately, also tried pmap with a list of inputs... but am not having much luck.

(I am interested also in more efficient ways to summarise a subset of columns from a data frame by a different subset of columns.)


Solution

  • There are a few solutions to this problem. I recommend the first because I think its the most clear. I also tweaked your function to fix any use of deprecated/superseded functions/behaviour.

    library(dplyr)
    library(purrr)
    
    # Function takes dataset and 2 categorical variables 
    # Calculates number of records for each combination of values in the two variables
    # Calculates % of var 1 responses for each level of var2
    ugh<-function(data, var1, var2){
      # add checks to make sure vars on dataset
      
      tab_n<-data%>%
        group_by(across(all_of(c(var1, var2))))%>%
        summarise(Numerator=n(), .groups="drop")%>%
        group_by(across(all_of(var2)))%>%
        mutate(Denominator=sum(Numerator)
               ,Pct=Numerator/Denominator*100
               # storing names of var1 and var2 for future subsetting
               , Var1= .env$var1 
               , Var2= .env$var2)%>% 
        rename(Var1_levels= all_of(var1)
               , Var2_levels= all_of(var2)
        )
    }
    
    # Sample output 
    combo1<-mtcars%>%ugh(var1="cyl", var2="gear")
    # can also run this as: 
    # combo1<-ugh(data=mtcars, var1="cyl", var2="gear")
    combo2<-mtcars%>%ugh(var1="cyl", var2="carb")
    
    sampleOutput<-rbind(combo1, combo2)
    
    sampleOutput
    #> # A tibble: 17 × 7
    #> # Groups:   Var2_levels [7]
    #>    Var1_levels Var2_levels Numerator Denominator    Pct Var1  Var2 
    #>          <dbl>       <dbl>     <int>       <int>  <dbl> <chr> <chr>
    #>  1           4           3         1          15   6.67 cyl   gear 
    #>  2           4           4         8          12  66.7  cyl   gear 
    #>  3           4           5         2           5  40    cyl   gear 
    #>  4           6           3         2          15  13.3  cyl   gear 
    #>  5           6           4         4          12  33.3  cyl   gear 
    #>  6           6           5         1           5  20    cyl   gear 
    #>  7           8           3        12          15  80    cyl   gear 
    #>  8           8           5         2           5  40    cyl   gear 
    #>  9           4           1         5           7  71.4  cyl   carb 
    #> 10           4           2         6          10  60    cyl   carb 
    #> 11           6           1         2           7  28.6  cyl   carb 
    #> 12           6           4         4          10  40    cyl   carb 
    #> 13           6           6         1           1 100    cyl   carb 
    #> 14           8           2         4          10  40    cyl   carb 
    #> 15           8           3         3           3 100    cyl   carb 
    #> 16           8           4         6          10  60    cyl   carb 
    #> 17           8           8         1           1 100    cyl   carb
    
    # Trying to use map to generate sampleOutput
    var1_vector=rep("cyl", 2) 
    var2_vector=c("gear", "carb")
    
    # Method 1 (recommended): use of anonymous functions
    map2(var1_vector, var2_vector, \(var1, var2) ugh(mtcars, var1, var2)) %>%
      list_rbind()
    #> # A tibble: 17 × 7
    #> # Groups:   Var2_levels [7]
    #>    Var1_levels Var2_levels Numerator Denominator    Pct Var1  Var2 
    #>          <dbl>       <dbl>     <int>       <int>  <dbl> <chr> <chr>
    #>  1           4           3         1          15   6.67 cyl   gear 
    #>  2           4           4         8          12  66.7  cyl   gear 
    #>  3           4           5         2           5  40    cyl   gear 
    #>  4           6           3         2          15  13.3  cyl   gear 
    #>  5           6           4         4          12  33.3  cyl   gear 
    #>  6           6           5         1           5  20    cyl   gear 
    #>  7           8           3        12          15  80    cyl   gear 
    #>  8           8           5         2           5  40    cyl   gear 
    #>  9           4           1         5           7  71.4  cyl   carb 
    #> 10           4           2         6          10  60    cyl   carb 
    #> 11           6           1         2           7  28.6  cyl   carb 
    #> 12           6           4         4          10  40    cyl   carb 
    #> 13           6           6         1           1 100    cyl   carb 
    #> 14           8           2         4          10  40    cyl   carb 
    #> 15           8           3         3           3 100    cyl   carb 
    #> 16           8           4         6          10  60    cyl   carb 
    #> 17           8           8         1           1 100    cyl   carb
    
    # If you aren't using a version of R with anonymous functions:
    map2(var1_vector, var2_vector, ~ ugh(mtcars, .x, .y)) %>%
      list_rbind()
    #> # A tibble: 17 × 7
    #> # Groups:   Var2_levels [7]
    #>    Var1_levels Var2_levels Numerator Denominator    Pct Var1  Var2 
    #>          <dbl>       <dbl>     <int>       <int>  <dbl> <chr> <chr>
    #>  1           4           3         1          15   6.67 cyl   gear 
    #>  2           4           4         8          12  66.7  cyl   gear 
    #>  3           4           5         2           5  40    cyl   gear 
    #>  4           6           3         2          15  13.3  cyl   gear 
    #>  5           6           4         4          12  33.3  cyl   gear 
    #>  6           6           5         1           5  20    cyl   gear 
    #>  7           8           3        12          15  80    cyl   gear 
    #>  8           8           5         2           5  40    cyl   gear 
    #>  9           4           1         5           7  71.4  cyl   carb 
    #> 10           4           2         6          10  60    cyl   carb 
    #> 11           6           1         2           7  28.6  cyl   carb 
    #> 12           6           4         4          10  40    cyl   carb 
    #> 13           6           6         1           1 100    cyl   carb 
    #> 14           8           2         4          10  40    cyl   carb 
    #> 15           8           3         3           3 100    cyl   carb 
    #> 16           8           4         6          10  60    cyl   carb 
    #> 17           8           8         1           1 100    cyl   carb
    
    # Alternatively, using pmap():
    args <- list(
      var1 = var1_vector,
      var2 = var2_vector
    )
    
    pmap(args, ugh, mtcars) %>%
      list_rbind()
    #> # A tibble: 17 × 7
    #> # Groups:   Var2_levels [7]
    #>    Var1_levels Var2_levels Numerator Denominator    Pct Var1  Var2 
    #>          <dbl>       <dbl>     <int>       <int>  <dbl> <chr> <chr>
    #>  1           4           3         1          15   6.67 cyl   gear 
    #>  2           4           4         8          12  66.7  cyl   gear 
    #>  3           4           5         2           5  40    cyl   gear 
    #>  4           6           3         2          15  13.3  cyl   gear 
    #>  5           6           4         4          12  33.3  cyl   gear 
    #>  6           6           5         1           5  20    cyl   gear 
    #>  7           8           3        12          15  80    cyl   gear 
    #>  8           8           5         2           5  40    cyl   gear 
    #>  9           4           1         5           7  71.4  cyl   carb 
    #> 10           4           2         6          10  60    cyl   carb 
    #> 11           6           1         2           7  28.6  cyl   carb 
    #> 12           6           4         4          10  40    cyl   carb 
    #> 13           6           6         1           1 100    cyl   carb 
    #> 14           8           2         4          10  40    cyl   carb 
    #> 15           8           3         3           3 100    cyl   carb 
    #> 16           8           4         6          10  60    cyl   carb 
    #> 17           8           8         1           1 100    cyl   carb