Search code examples
roptimizationgreedy

Greedy optimization in R


I am trying to replicate Caruana et al.'s method for Ensemble selection from libraries of models (pdf). At the core of the method is a greedy algorithm for adding models to the ensemble (models can be added more than once). I've written an implementation for this greedy optimization algorithm, but it is very slow:

library(compiler)
set.seed(42)
X <- matrix(runif(100000*10), ncol=10)
Y <- rnorm(100000)

greedOpt <- cmpfun(function(X, Y, iter=100){
  weights <- rep(0, ncol(X))

  while(sum(weights) < iter) {

    errors <- sapply(1:ncol(X), function(y){
      newweights <- weights
      newweights[y] <- newweights[y] + 1  
      pred <- X %*% (newweights)/sum(newweights)
      error <- Y - pred
      sqrt(mean(error^2))
    })

    update <- which.min(errors)
    weights[update] <- weights[update]+1
  }
  return(weights/sum(weights))
})

system.time(a <- greedOpt(X,Y))

I know R doesn't do loops well, but I can't think of any way to do this type of stepwise search without a loop.

Any suggestions for improving this function?


Solution

  • Here is an R implementation that is 30% faster than yours. Not as fast as your Rcpp version but maybe it will give you ideas that combined with Rcpp will speed things further. The two main improvements are:

    1. the sapply loop has been replaced by a matrix formulation
    2. the matrix multiplication has been replaced by a recursion

    greedOpt <- cmpfun(function(X, Y, iter = 100L){
    
      N           <- ncol(X)
      weights     <- rep(0L, N)
      pred        <- 0 * X
      sum.weights <- 0L
    
      while(sum.weights < iter) {
    
          sum.weights   <- sum.weights + 1L
          pred          <- (pred + X) * (1L / sum.weights)
          errors        <- sqrt(colSums((pred - Y) ^ 2L))
          best          <- which.min(errors)
          weights[best] <- weights[best] + 1L
          pred          <- pred[, best] * sum.weights
      }
      return(weights / sum.weights)
    })
    

    Also, I maintain you should try upgrading to the atlas library. You might see significant improvements.