Search code examples
raprioriarules

How to use arules to identify top n recommended items and their rules?


While head() can be used to extract the top n rules, some RHS items may appear multiple times. I'd like to find the top n unique RHS items as well as the top rule for each such item.

I've written code that accomplishes this but it runs very slow, presumably due to use of 'subset' function, which is very inefficient. My code I iterates over the unique items of RHS, finds the subset of rules related to it, and returns the single top rule of the item. Is this an effective or efficient way of doing it? Is there a better way?

library(arules)
data("Groceries")
rules = apriori(Groceries,
                parameter = list(supp = 0.01, conf = 0.1, target = "rules"),
                appearance = list(lhs=c("whole milk", "root vegetables"), default="rhs"))

rules = sort(rules, by=c("confidence", "lift", "support"))
rhs.unique = unique(rules@rhs@itemInfo$labels[rules@rhs@data@i+1]) #Already sorted by top items.

#Function that returns the top rule for a particular RHS item in a set of rules.
top_item_rule = function(item, rules=NULL) {
  rules = subset(rules, rhs %in% item)
  rules = sort(rules, by=c("confidence", "lift", "support"))
  head(rules, n=1)
}

n = 3
toprules = lapply(rhs.unique[1:n], top_item_rule, rules)
toprules = do.call(c, args=toprules)

Solution

  • How about this?

    rules <- sort(rules, by=c("confidence", "lift", "support"))
    rules[!duplicated(rhs(rules))]
    

    It returns the for each rhs the top (first after sorting) rules.