Search code examples
rlistfunctionlapplymapply

Applying custom function to a list of DFs, taking another list as an input - R


I have a list of dfs and a list of annual budgets. Each df represents one business year, and each budget represents a total spend for that year.

# the business year starts from Feb and ends in Jan.
# the budget column is first populated with the % of annual budget allocation

df <- data.frame(monthly_budget=c(0.06, 0.13, 0.07, 0.06, 0.1, 0.06, 0.06, 0.09, 0.06, 0.06, 0.1, 0.15),
          month=month.abb[c(2:12, 1)])

# dfs for 3 years
df2019_20 <- df
df2020_21 <- df
df2021_22 <- df

# budgets for 3 years
budget2019_20 <- 6000000
budget2020_21 <- 7000000
budget2021_22 <- 8000000

# into lists
df_list <- list(df2019_20, df2020_21, df2021_22)
budget_list <- list(budget2019_20, budget2020_21, budget2021_22)

I've written the following function to both apply the right year to Jan and fill in the rest by deparsing the respective dfs name. It works perfectly if I supply a single df and a single budget.

budget_func <- function(df, budget){ 
  
  df_name <- deparse(substitute(df))
  
  df <- df %>%
    mutate(year=ifelse(month=="Jan",
                       as.numeric(str_sub(df_name, -2)) + 2000,
                       as.numeric(str_extract(df_name, "\\d{4}(?=_)")))
    )
  
  for (i in 1:12){ 
    
    df[i,1] <- df[i,1] * budget
    
    i <- i+1
  }
  return(df)
}

To speed things up I want to pass both lists as arguments to mapply. However I don't get the results I want - what am I doing wrong?

final_budgets <- mapply(budget_func, df_list, budget_list)

Solution

  • Instead of using deparse/substitute (which works when we are passing a single dataset, and is different in the loop because the object passed is not the object name), we may add a new argument to pass the names. In addition, when we create the list, it should have the names as well. We can either use list(df2019_20 = df2019_20, ...) or use setNames or an easier option is dplyr::lst which does return with the name of the object passed

    budget_func <- function(df, budget, nm1){ 
               df_name <- nm1
      
                 df <- df %>%
                   mutate(year=ifelse(month=="Jan",
                                      as.numeric(str_sub(df_name, -2)) + 2000,
                                      as.numeric(str_extract(df_name, "\\d{4}(?=_)")))
                   )
      
                 for (i in 1:12){ 
        
                   df[i,1] <- df[i,1] * budget
        
                   i <- i+1
                 }
                 return(df)
               }
    

    -testing

    df_list <- dplyr::lst(df2019_20, df2020_21, df2021_22)
    budget_list <- list(budget2019_20, budget2020_21, budget2021_22)
    Map(budget_func, df_list, budget_list, names(df_list))     
    

    -output

    $df2019_20
       monthly_budget month year
    1          360000   Feb 2019
    2          780000   Mar 2019
    3          420000   Apr 2019
    4          360000   May 2019
    5          600000   Jun 2019
    6          360000   Jul 2019
    7          360000   Aug 2019
    8          540000   Sep 2019
    9          360000   Oct 2019
    10         360000   Nov 2019
    11         600000   Dec 2019
    12         900000   Jan 2020
    
    $df2020_21
       monthly_budget month year
    1          420000   Feb 2020
    2          910000   Mar 2020
    3          490000   Apr 2020
    4          420000   May 2020
    5          700000   Jun 2020
    6          420000   Jul 2020
    7          420000   Aug 2020
    8          630000   Sep 2020
    9          420000   Oct 2020
    10         420000   Nov 2020
    11         700000   Dec 2020
    12        1050000   Jan 2021
    
    $df2021_22
       monthly_budget month year
    1          480000   Feb 2021
    2         1040000   Mar 2021
    3          560000   Apr 2021
    4          480000   May 2021
    5          800000   Jun 2021
    6          480000   Jul 2021
    7          480000   Aug 2021
    8          720000   Sep 2021
    9          480000   Oct 2021
    10         480000   Nov 2021
    11         800000   Dec 2021
    12        1200000   Jan 2022