Search code examples
rtidyversepurrr

Avoiding Loops in R for Accumulating Function Values


Say I have some example data that containing the location, condition, and cost of equipment:

set.seed(0)

n <- 10

machine_data <- data.frame(c(1:n), sample.int(2, n, replace=TRUE), runif(n, min=50, max=100), runif(n, min=20000, max=100000))
colnames(machine_data) <- c("ID", "Location", "Condition", "ReplaceCost")

I want to determine what course of actions (Do Nothing, Repair, Replace) assigned to a $Location over successive periods. The total number of permutations of these different choices are:

actions <- c("do_nothing", "repair", "replace")
periods <- 3
perms <- gtools::permutations(n=length(actions), r=periods, v=actions, repeats.allowed = T)

So in this case, I have 27 possible permutations of actions for each $Location and I collect these different possibilities into a dataframe:

n <- length(unique((machine_data$Location)))
decisions <- do.call(rbind, replicate(n, perms, simplify=FALSE))
df <- data.frame(rep(c(1:n), each=nrow(perms)), decisions)
action_labels <- paste("Period", c(1:periods))
colnames(df) <- c("Location", action_labels)
df$Improvement <- 0
df$Cost <- 0

The idea is to calculate the relative improvement and cost of each permutation. These values will are then sent to an optimization model to select the best course of action for each $Location given a budget constraint. The function to perform a singular action is:

replace_threshold <- 50
repair_threshold <- 85
perform_action <- function(action, location_data, repair_threshold, replace_threshold)
{
  # repair decisions and resulting condition
  location_data <- location_data %>%
    # repair results in condition of 95; replace results in 100
    mutate(new_cond = ifelse(action == "repair" & Condition <= repair_threshold, 95, 
                      ifelse(action == "replace" & Condition <= replace_threshold, 100, Condition))
           )
  # the total score for a location is a cost-weighted aggregation of individual equipment
  location_score <- sum(location_data$new_cond * location_data$ReplaceCost)/sum(location_data$ReplaceCost)
  
  return(location_score)
}

I can do this row-wise in a loop:

for(i in 1:n)
{
  location_data <- machine_data[machine_data$Location == i,]
  for(r in 1:nrow(df))
  {
    if(df$Location[r] == i)
    {
      action_set <- df[r,2:(length(actions)+1)]
      for(a in 1:length(action_set))
      {
        action <- toString(action_set[a])
        df$Improvement[r] <- perform_action(action, location_data, repair_threshold, replace_threshold)
        # calculate df$Cost here...
        # update machine_data$Condition for next period here...
      }
    }
  }
}

However, this can become slow at larger scaled problems. I'm looking for a way to speed this up. Is it possible to use something like purrr's accumulate() function to execute each of the actions in the action_set in the looping operation above? I'm open to other non-tidyverse approaches, but that was just one option that I've tried to figure out without success.


Solution

  • Your example loop doesn't appear to actually accumulate the improvements across each period, instead it just replaces the previous period with the next. So I can't check my answer against your example data, but I think what you want reduce instead of accumulate

    Instead of your inner loop

    action_set <- df[r,2:(length(actions)+1)]
    for(a in 1:length(action_set)){
        action <- toString(action_set[a])
        df$Improvement[r] <- perform_action(action, location_data, repair_threshold, replace_threshold)
    

    Accumulate will give you a vector of each successive calculation

    purrr::accumulate(
        #First grab the actions from the row as a vector
        df[1, 2:4] %>% as_vector(),
        #Then pass them to an anonymous function. x contains the accumulated value, while y is the action
        #I've set the location to 1 as if this were one run through the outer loop
        function(x,y) {
        #Set the accumulated value to the prior value plus the calculated action
            x <- x + perform_action(y, machine_data[machine_data$Location == 1,],
                repair_threshold, replace_threshold)
            return(x)
        },
        #Set the initial value of x to 0
        .init = 0)
    #>     .init  Period 1  Period 2  Period 3 
    #>   0.00000  76.79909 153.59818 230.39728
    

    And you can see it returns a vector of each successive value. What I think you want is just the final value which you get with reduce

    purrr::reduce(df[1, 2:4] %>% as_vector(),
        function(x,y) {
            x <- x + perform_action(y, machine_data[machine_data$Location == 1,],
                repair_threshold, replace_threshold)
            return(x)
        }, .init = 0)
    #> [1] 230.3973