Search code examples
rinformation-retrievalaverage-precision

A faster R implementation of average precision at N


The excellent Metrics package provides a function to calculate average precision: apk.

The problem is, it's based on a for loop, and it's slow:

require('Metrics')
require('rbenchmark')
actual <- 1:20000
predicted <- c(1:20, 200:600, 900:1522, 14000:32955)
benchmark(replications=10,
          apk(5000, actual, predicted),
          columns= c("test", "replications", "elapsed", "relative"))

                          test replications elapsed relative
1 apk(5000, actual, predicted)           10   53.68        1

I'm at a loss as to how to vectorize this function, but I was wondering if perhaps there's a better way to implement this in R.


Solution

  • I'd have to agree the implementation looked pretty bad... Try this instead:

    apk2 <- function (k, actual, predicted)  {
    
        predicted <- head(predicted, k)
    
        is.new <- rep(FALSE, length(predicted))
        is.new[match(unique(predicted), predicted)] <- TRUE
    
        is.relevant <- predicted %in% actual & is.new
    
        score <- sum(cumsum(is.relevant) * is.relevant / seq_along(predicted)) /
                 min(length(actual), k)
        score
    }
    
    benchmark(replications=10,
              apk(5000, actual, predicted),
              apk2(5000, actual, predicted),
              columns= c("test", "replications", "elapsed", "relative"))
    
    #                            test replications elapsed relative
    # 1  apk(5000, actual, predicted)           10  62.194 2961.619
    # 2 apk2(5000, actual, predicted)           10   0.021    1.000
    
    identical(apk(5000, actual, predicted),
              apk2(5000, actual, predicted))
    # [1] TRUE