Search code examples
rarulesbigdata

in Arules, return the smallest support items from a lot of rules


I'm working with a huge data set which is producing a ton of rules. I only need the high lift low support rules but i'm getting over 15 million(this is after setting min/maxlen and cleaning up my source data)

What i'm trying to do now is to create a head of several million, and subtract that from all the rules. My hope is that eventually all that is left is the bottom of the barrel.

Code:

basket_rules2 <- apriori(ttk, parameter = list(sup = 0.03, conf = 0.25, target="rules", minlen=4, maxlen=4, maxtime=0), appearance = list(rhs = "Fail: Generator Boot-up", default ="lhs"))

rules <- sort(basket_rules, by = "sup")
head1 <- head(rules, 2000000)
head2 <- rules[ !(rules %in% head1), ]
> summary(head2)
>set of 0 rules

I also tried:

rules <- sort(basket_rules, by = "sup")
head1 <- head(rules, 2000000)
head2 <- rules[-head1,]
>Error in -head1 : invalid argument to unary operator

I've used similiar syntax while sampling, i'm not sure why this is not working. All I really need is to get to the low support aprori rules, I feel like i may be making this way more complicated than it needs to be. Any suggestions on why my code is not working, or how I can get the really low sup/conf rules?


Solution

  • I hope I understand your question correctly. I think you could do this

    library(arules)
    data(Groceries)
    
    rules <- apriori(Groceries, parameter = list(support= 0.0002))
    

    this produces about 2 million rules. Now you can get the 100 rules with the lowest support using tail:

    low_support_rules <- tail(rules, by = "support", n = 100)
    

    Now you can sort the low support rules using lift.