Search code examples
rdataframerecursionfold

What functions in R can recursively "reduce" the rows of a dataframe?


What functions in R can recursively "reduce" the rows of a dataframe? I'm thinking of a function like Reduce(), but that accepts a dataframe instead of a vector, and a function that accepts each row of the dataframe and an accumulator.

Consider the following example that creates a dataframe that contains the price and quantity of a list of purchases and uses Reduce() to calculate the running total cost.

purchases = data.frame(
  price = c(1.50, 1.75, 2.00, 2.10, 1.80),
  quantity = c(100, 80, 50, 20, 90)
)
print(purchases)
#>   price quantity
#> 1  1.50      100
#> 2  1.75       80
#> 3  2.00       50
#> 4  2.10       20
#> 5  1.80       90
purchase_costs <- purchases$quantity * purchases$price
print(purchase_costs)
#> [1] 150 140 100  42 162
total_cost <- Reduce(
  function(total_cost, cost) { total_cost + cost },
  purchase_costs,
  accumulate = TRUE
)
print(total_cost)
#> [1] 150 290 390 432 594

Created on 2022-02-01 by the reprex package (v2.0.1)

What functions in R similar to Reduce() might calculate this running total cost by recursively processing each purchase in the dataframe rather than each cost in a vector of costs? Such a Reduce() function might resemble the following:

total_cost <- Reduce(
  function(total_cost, purchase) { total_cost + purchase["quantity"] * purchase["price"] },
  purchases,
  accumulate = TRUE
)

Solution

  • Reduce by itself isn't going to operate row-wise like you want: it works well on a simple vector or list, but not on rows of a frame.

    Try this frame-aware function:

    Reduce_frame <- function(data, expr, init) {
      expr <- substitute(expr)
      out <- rep(init[1][NA], nrow(data))
      for (rn in seq_len(nrow(data))) {
        out[rn] <- init <- eval(expr, envir = data[rn,])
      }
      out
    }
    
    Reduce_frame(purchases, init + quantity*price, init=0)
    # [1] 150 290 390 432 594