I'm working with a huge data set which is producing a ton of rules. I only need the high lift low support rules but i'm getting over 15 million(this is after setting min/maxlen and cleaning up my source data)
What i'm trying to do now is to create a head of several million, and subtract that from all the rules. My hope is that eventually all that is left is the bottom of the barrel.
Code:
basket_rules2 <- apriori(ttk, parameter = list(sup = 0.03, conf = 0.25, target="rules", minlen=4, maxlen=4, maxtime=0), appearance = list(rhs = "Fail: Generator Boot-up", default ="lhs"))
rules <- sort(basket_rules, by = "sup")
head1 <- head(rules, 2000000)
head2 <- rules[ !(rules %in% head1), ]
> summary(head2)
>set of 0 rules
I also tried:
rules <- sort(basket_rules, by = "sup")
head1 <- head(rules, 2000000)
head2 <- rules[-head1,]
>Error in -head1 : invalid argument to unary operator
I've used similiar syntax while sampling, i'm not sure why this is not working. All I really need is to get to the low support aprori rules, I feel like i may be making this way more complicated than it needs to be. Any suggestions on why my code is not working, or how I can get the really low sup/conf rules?
I hope I understand your question correctly. I think you could do this
library(arules)
data(Groceries)
rules <- apriori(Groceries, parameter = list(support= 0.0002))
this produces about 2 million rules. Now you can get the 100 rules with the lowest support using tail
:
low_support_rules <- tail(rules, by = "support", n = 100)
Now you can sort the low support rules using lift.